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PREFACE 



An important concept in the design of many information processing systems - 
such as transaction processing systems, decision support systems, and work- 
flow systems - is that of a graph. In its simplest form a graph consists of a set 
of points (or nodes) and a set of ordered or unordered pairs of nodes (or edges). 
If the pairs of nodes are unordered, the graph is called a simple graph, and if 
they are ordered, the graph is called a directed graph, or digraph. In both cases, 
the graph represents a network through which materials, people, information, 
etc. can flow. The difference is whether the flow is restricted to one direction 
or whether there is no such restriction. 

Simple graphs and digraphs allow for the construction of a variety of di- 
agrammatic system design tools - such as entity-relationship diagrams, func- 
tional dependency diagrams, data flow diagrams, Petri nets, semantic nets, and 
the like. We note that most of these tools are representational, not analytical. 
That is, they provide a convenient and visually appealing format for illustrat- 
ing information infrastructures, while allowing any subsequent analyses to be 
performed by the user. 

Another problem with such graphical structures is that they usually asso- 
ciate individual information elements and not sets of elements. Yet in many 
cases it is necessary to associate sets of elements - such as multiple attributes 
in data relations, multiple variables in decision models, multiple logical vari- 
ables in decision rules, and multiple documents in workflow systems. Further- 
more, it may be necessary to integrate data relations, decision models, deci- 
sion rules, and workflows into an integrated information processing system. 
Two multiple-element structures, hypergraphs and digraphs, allow a few such 
representations, but they have their limitations. 

A recently developed graphical structure that overcomes the limitations and 
shows great promise in modeling information processing systems is a meta- 
graph. Metagraphs are more complex than the graph structures described 
above, but they allow representation and analysis of more complex systems. 
Although there is a substantial literature on metagraphs, this is all in the form 
of journal articles and papers in conference proceedings. There have been no 
books presenting a comprehensive picture of the foundations of metagraphs 
and the applicability of these foundations to the design of information process- 




Preface 



viii 

ing systems. This book attempts to fill that gap by providing a single and com- 
prehensive treatment of metagraphs. 

We begin with a brief introduction to metagraphs. A metagraph is a col- 
lection of directed set-to-set mappings. Although this is a simple definition, 
it leads to several powerful theoretical results and several interesting applica- 
tions. We then present the material in this book in two parts. The first devel- 
ops the theoretical results. Although we will include diagrams for purposes 
of exposition, the emphasis will be on the development of a metagraph alge- 
bra. This is a matrix algebra defined over the elements and edges of a meta- 
graph, resulting in incidence and adjacency matrices. This in turn will lead 
to a more sophisticated view of paths in a metagraph, resulting in the con- 
cept of a metapath. We will also be concerned with (1) certain transformations 
of metagraphs, especially the projection of a metagraph to produce a simpler 
metagraph, (2) conditional metagraphs, in which the calculations performed 
early in a metagraph process determine the structure of the later part of the 
metagraph, and (3) submetagraphs that are largely independent of their con- 
taining metagraphs. 

In the second part of the book we will examine four promising applications 
of metagraphs. The first is the modeling of data relations, each of which is 
viewed as a mapping from a set of key elements to a set of content elements. 
The second is the modeling of decision models, each of which is viewed as a 
mapping from a set of input variables to a set of output variables. The third 
is the modeling of decision rules, each of which is viewed as a mapping from 
a set of logical antecedent variables to a set of logical consequent variables. 
The fourth is the modeling of workflow tasks, each of which is viewed as a 
mapping from a set of input documents to a set of output documents. We will 
apply the theoretical results of the first part of the book to the application areas 
of the second part. 

We conclude this book by briefly examining several possible extensions of 
this work. Of special interest is the structuring of the metagraph modeling 
process, which may enhance the body of work on systems analysis and design 
(and also software engineering), the development of a metagraph workbench 
to support such a process, and the possible application of our results, suitably 
enhanced, to social networks. 




Chapter 1 

GRAPHS, HYPERGRAPHS, 

AND METAGRAPHS 

An important concept in the design of many information processing systems 

- such as transaction processing systems, decision support systems, project 
management systems, and workflow systems - is that of a graph. In its simplest 
form, a graph consists of a set of elements (or nodes) and a set of ordered 
or unordered pairs of nodes (or edges). A substantial body of theoretical and 
applied research on various types of graphs has made it possible to develop 
powerful analytical tools for systems design. The purpose of this chapter is 
to summarize some of the existing graph-based tools used in this area, and the 
purpose of this book is to present a new graphical structure, called metagraphs, 
that enhances existing structures and overcomes some of their disadvantages. 

We begin in Section 1 by describing some of the traditional uses of graphs 

- tools for visualizing relationships between data elements, data aggregates, 
data structures, files, documents, and the like. Specifically, we examine entity- 
relationship diagrams, functional dependency diagrams, data flow diagrams, 
and semantic nets. In each of these cases the purpose of the graph is to display 
the structure of data so that a user can infer possible relationships of interest. 
Although it may be possible to use these structures as the basis of an analytical 
model, the purpose of the diagram/network is to assist the user’s intuition in 
understanding important relationships among data elements, aggregates, etc. 

The three remaining sections of this chapter summarize the remainder of the 
book. First, in Section 2 we review graph structures related to metagraphs - 
especially, simple graphs, directed graphs, hypergraphs, higraphs, and Petri 
nets. Then in Section 3 we provide a brief overview of metagraph theory, 
which we will examine in more detail in Chapters 2-6. Finally, in Section 4 
we provide a brief overview of metagraph applications, which we will exam- 
ine in more detail in Chapters 7-10. The ideas in this book are based on a set 
of papers published by the authors in a variety of journals and conference pro- 
ceedings. These papers are included in the references at the end of the book. 

1. GRAPHS AND DATA VISUALIZATION 



We begin by describing three types of graphical structures, used in three 
types of diagramming conventions. The first type of diagramming convention 
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concerns the static nature of stored data - that is, the structure of databases. 
We will examine two approaches to diagramming databases. The first of these 
is based on the assumption that the data base is in relational form, so that the 
files (tables, relations) describe both the entities about which data is recorded, 
such as suppliers and the parts they supply, and the relationships among the 
entities. This results in the entity-relationship approach to data, illustrated in 
Figure 1.1. 

In Figure 1.1 the supplier relation consists of two data attributes: the ID 
of the supplier, which is the key attribute, and the location of the supplier, 
which is the content attribute. Similarly, the part relation consists of two data 
attributes, the part ID, which is the key, and the weight of the part, which is 
the content. Finally, there is a many-to-many relationship between suppliers 
and parts, resulting in an intersection relation with a compound key (i.e., the 
two IDs), along with a content element (the price that the particular supplier 
charges for the particular part). Of course, if all suppliers charged the same 
price for any particular part, then the price would be a content attribute in the 
part relation. Thus, the structure of the data base depends on the structure of 
the real world about which data is being stored and/or the business rules of the 
organization. 

Yet another approach to diagramming data bases is to focus on the func- 
tional dependencies among the data attributes. This is illustrated by the func- 
tional dependency diagram illustrated in Figure 1.2. We can see that the sup- 




Figure 1.1. An entity-relationship diagram. 




Figure 1.2. A functional dependency diagram. 
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Figure 1.3. A data flow diagram. 

plier ID uniquely determines location, the part ID determines weight, and the 
two together determine price. Both diagrams denote the same information, and 
both can he augmented with additional semantic information, such as the sup- 
ply relationship in this case (i.e., the fact that suppliers supply parts). 

The two structures outlined above describe the static structure of data in an 
organization, but they do not describe the dynamic nature of data as informa- 
tion flows throughout an organization. A common way of doing this is with a 
data flow diagram, illustrated in Figure 1.3. We assume that an applicant for 
credit submits an application, a credit check is performed, using a credit his- 
tory file, and a report is sent to a credit manager. The diagram illustrates the 
relationships among the sources (applicant) and destinations (credit manager) 
of data, along with credit check process and the credit history file. This is a 
fop-level (or Level 0) diagram, which might then be decomposed into lower- 
level (Levels 1, 2, etc.) diagrams, and the processes are usually numbered to 
make it apparent how the more detailed processes relate to each other. 

Finally, we look at another type of data structure, one that describes rela- 
tionships among concepts. This is captured by a semantic net, illustrated in 
Figure 1.4. The semantic net captures relationships among the concepts, such 
as instance, subclass, and others (e.g., a mouse eats cheese) and allows con- 
cepts to inherit properties from other concepts. For example, since a mouse is 
a mammal and Mickey Mouse is a mouse, then Mickey Mouse is a mammal. 
In addition, Mickey Mouse eats cheese and is an animal. 

In summary, simple diagrammatic frameworks, based on graphical struc- 
tures, can be used to illustrate relationships among items of interest by means 
of simple visualization. This allows analysts to structure the systems they must 
deal with and draw inferences about the behavior of these systems. But graphs 
can serve not only as a foundation for visualization-based inference, but they 
can also serve as a foundation for algebraic operations that allow for more 
rigorous calculation of properties. 
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Figure 1.4. A semantic net. 



2. GRAPH STRUCTURES 

We next review two traditional graph struetures (ssimple graph and directed 
graphs), three more recent structures (hypergraphs, higraphs, and Petri nets), 
and finally metagraphs. To illustrate these structures, consider a system in 
which there are three input variables: 

Pri = the sale price of a product, 

Vol = the sales volume. 

Wage = the prevailing wage rate. 

There are also two intermediate variables: 

Rev = the revenue realized, which depends on the price and the volume, 
Exp = the expense incurred, which depends on the volume sold and the 
wage rate. 

Finally there are two output variables: 

Prof = the realized profit. 

Notes = notes payable as a result of borrowings to cover expenses. 

We assume that Pri and Vol determine Rev, Vol and Wage determine Exp, 
Rev and Exp determine Prof and Notes, and Exp determines Notes. We note 
that Notes can be determined either from Rev and Exp (along with Prof) or 
directly from Exp. Thus, there is a limited amount of redundancy in this set of 
calculation procedures, which may give the user a limited amount of discre- 
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tion in implementing them; however, this could lead to inconsistencies in the 
results. 

The traditional graph structures for descrihing these variables and the rela- 
tionships between them are simple graphs and directed graphs (Berge, 1985). 
A simple graph is illustrated in Figure 1.5. It consists of seven nodes, one 
corresponding to each of the seven variables defined above, along with seven 
(unordered) pairs of nodes, one for each of the edges (line segments) in the 
figure. Thus, we can see that there is a direct relationship between, for exam- 
ple, Price and Revenue, although the direction of the relationship is not clear. 
There is also an indirect relationship between Price and Profit; Price does not 
directly determine Profit, but Price does determine Profit through Revenue. 
The sequence of edges connecting Price to Profit is called a path. The problem 
is that there is also a path connecting Price to Volume, with Revenue as an in- 
termediate node. Since we do not know the directions of the relationships, we 
might also conclude that Price determines Volume through Revenue, which is 
not the case. 

A more revealing graph is a directed graph, or digraph, in which the edges 
are ordered pairs of nodes, represented visually by arrows. The edges of a 
directed graph describe the directions of the relationships among variables 
(nodes). This is illustrated in Figure 1.6. We can see that Price is necessary 
to determine Revenue, and not vice versa, and there is a path Price to Profit 
through Revenue. But now there is another problem. The directed graph re- 
veals that Price and Volume determine Revenue, but it is not clear whether 
either Price and Volume alone are sufficient to determine Revenue, or whether 
both are needed. This can be overcome with AND/OR graphs, in which arcs 
spanning the directed edges specify whether the relationships are conjunctive 
or disjunctive. However, AND/OR graphs are clumsy for large numbers of 
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Figure 1.6. A directed graph. 




Figure 1.7. A hypergraph. 

nodes and edges (i.e., variables and relationships), and a less complicated ap- 
proach is needed. 

A partial solution is offered by hypergraphs (Berge, 1989). In a hypergraph 
each edge is a set of one or more elements, which allows us to represent re- 
lationships among multiple elements. This is illustrated in Figure 1.7. We can 
see, for example, that Price, Volume, and Revenue are all part of a single re- 
lationship. As before, we can identify paths consisting of sequences of hyper- 
graph edges connecting variables such as Price and Profit. The problem, as 
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Pri 





Rev 



Exp 




Figure 1.8. A directed hypergraph. 

with simple graphs, is that the edges do not capture any sense of direction. 
For example, the hypergraph does not tell us whether Price and Volume are 
used to determine Revenue or whether some other relationship is intended - 
for example, that Price and Revenue are used to determine volume. 

A solution is to combine the directed character of a digraph with the multi- 
variate character of a hypergraph, resulting in a directed hypergraph, as illus- 
trated in Figure 1.8 (Ramaswami, Sarkar and Chen, 1997). In Figure 1.8, the 
set {Rev, Exp} is called the tail of the edge and the set (Prof, Notes} is called 
the head of the edge between them. Using directed hypergraphs, we can define 
relationships between sets of variables, such as (Pri, Vol, Wage} and {Prof}. 

Another structure is higraphs - or hierarchical graphs (Hard, 1988). A hi- 
graph is a collection of “blobs”, each of which may contain elements and sub- 
blobs, which may in turn contain certain elements and other sub-blobs, etc. 
(Figure 1.9). Higraphs have the advantage of flexibility - for example, edges 
can originate and terminate within blobs. But this comes at the expense of an- 
alytical complexity. A related structure is statecharts (Hard, 1987), which can 
be used to represent sequences of calculations. 

Another dynamic structure is Petri nets (Peterson, 1981). Petri nets are di- 
rected graphs containing of two types of nodes - places and transitions (Fig- 
ure 1.10). Places may contain tokens, and when all of the places leading into a 
transition are enabled (i.e., contain at least one token), the transition may fire, 
removing a token from each of the places leading into it and placing a token in 
each place leading out of it. The process in Figure 1.10 begins with the transi- 
tions on the left side of the net firing in either order, removing the tokens from 
the Pri, Vol, and Wage places. A token would now appear in the Rev place and 
two tokens would appear in the Exp place. Now the two transitions on the right 
side of the net can fire, again in either order, placing tokens in Prof and Notes 
places. At this point no further transitions can fire and the process terminates. 
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Figure 1.9. Ahigraph. 



Pri 




Figure 1.10. A Petri net. 



Finally, we introduce the structure on which this hook will focus - that of 
metagraphs, illustrated in Figure 1. 1 1. A metagraph is a set of elements, which 
are assumed to he atomic, along with a set of edges. Each edge is an ordered 
pair of sets of elements, the first of which is called the invertex and the second 
of which is called the outvertex. Thus, metagraphs can he used to model: 
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Figurel.il. A metagraph. 

• Data bases, in which invertices represent key attributes and outvertices 
represent content elements; 

• Model bases, in which invertices represent model inputs and outvertices 
represent model outputs; 

• Rule bases, in which invertices represent antecedent variables and out- 
vertices represent consequent variables, and 

• Workflow systems, in which invertices represent information flows enter- 
ing a workstation and outvertices represent information flows emanating 
from a workstation. 

Of these structures, the one closest to metagraphs is directed hypergraphs, 
in which the edges are also ordered pairs of sets of elements. The principal 
difference between metagraphs and directed hypergraphs is in the type of re- 
search done in these areas. Much of the work done on metagraphs is in deci- 
sion support systems (DSS), and especially model-based DSS and in workflow 
management systems, although, as we will see, metagraphs are also relevant to 
other information structures, such as data management systems and rule-based 
systems. 

We will examine two aspects of metagraphs, corresponding to the two parts 
of the book. The first is Part I: Metagraph Theory, which consists of five chap- 
ters, beginning with Chapter 2. The second is Part II: Applications of Meta- 
graphs, which consists of four chapters. These are described below. 

3. METAGRAPH THEORY (PART I) 

Our purpose in Part I of this book is to present fundamental constructs (def- 
initions, theorems, and interpretations of their significance) in a way that is in- 
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dependent of any problem context. This will provide a background for Part II 
in that it will provide the mathematical underpinnings essential to understand- 
ing the role that metagraphs play in analyzing the applications. But it will also 
provide an understanding of metagraphs that may be of assistance to anyone 
considering other application areas for which metagraphs may be useful. 

• Chapter 2, The Algebraic Structure of Metagraphs, presents the use 
of matrices to describe metagraphs. An example is the adjacency ma- 
trix, a square matrix with one row and one column for each element in 
the generating set. Each member of the matrix is a set of triples, one for 
each edge connecting the row element to the column element. The triples 
define the invertex, outvertex, and the edge. We define addition and mul- 
tiplication operators for the adjacency matrix, which allows us to define 
a transitive closure (i.e., a sum of powers) of the matrix. This will form 
the basis for a specification of the connectivity properties of metagraphs 
to be discussed in the next chapter. 

• Chapter 3, Connectivity Properties of Metagraphs, examines the prin- 
cipal use of metagraphs discussed in this book. This is to determine 
whether there is a path connecting one set of elements to another set 
of elements. The definition of paths used in simple graphs and directed 
graphs, in which a path is a sequence of edges connecting a source ele- 
ment to a target element, does not apply here. Rather we define a meta- 
path, which is a set (rather than a sequence) of edges connecting a set of 
source elements to a set of target elements, and this allows us to represent 
the parallelism found in more complex systems. In addition, we define 
metapath dominance, in which superfluous input elements and superflu- 
ous edges do not appear. We also investigate cycles, which are metapaths 
from a set of elements to itself. 

• Chapter 4, Metagraph Transformations, examines the projection of a 
metagraph along a subset of its generating set. The elements in the pro- 
jection consist only of those in the subset, and the edges in the projection 
correspond to metapaths in the original metagraph. Thus, a projection 
captures the connectivity relationships in a subset of a metagraph and 
thus represents a view of the metagraph taken by a person who is inter- 
ested only in the elements contained in the projection set and the relation- 
ships between them. We also examine two related constructs, the inverse 
of a metagraph, in which edges become elements and elements become 
edges, and two related constructs - the pseudo-dual metagraph and the 
element-flow metagraph. 

• Chapter 5, Attributed Metagraphs, presents an enhanced view of meta- 
graphs in which additional variables, called attributes, are associated with 
the edges. One type of attribute is a resource, a qualitative or quantitative 
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variable, which must be present (qualitative variable) or present in suffi- 
cient amount (quantitative variable) for the edge to appear in a metapath. 
Another type of attribute is an assumption, a Boolean variable that must 
be true for the edge to appear in a metapath. In this case the assumptions 
will be a part of the generating set - that is, the truth value of an assump- 
tion may be in the outvertex of one of the edges or may be known at 
the start of an analysis. A metagraph containing assumptions is called a 
conditional metagraph. 

• Chapter 6, Sub-metagraphs and Their Properties, examines three 
concepts involving conditional metagraphs: full connectivity, non-redun- 
dancy, and independence. A set of elements B is fully connected to an- 
other set C if for every interpretation of the assumptions (i.e., every pos- 
sible set of truth values for the assumptions) there is at least one metapath 
from B to C. B is non-redundantly connected to C if for every interpreta- 
tion there is at most one metapath from B to C. A sub-metagraph within a 
metagraph, defined by a subset of elements and a subset of edges, is inde- 
pendent of the larger metagraph if the sub-metagraph interacts with the 
larger metagraph only through its (i.e., the sub-metagraph’s) input and 
output elements and not through any intermediate elements. We analyze 
the properties of sub-metagraphs in terms of these three properties. 

4. APPLICATIONS OF METAGRAPHS (PART II) 

Our purpose in Part II of this book is to discuss applications of metagraphs 
to three areas: model management, data and rule management, and workflow 
and process analysis. We conclude by examining possible computational and 
decision support applications of metagraphs in the form of a metagraph work- 
bench. 

• Chapter 7, Metagraphs in Model Management, presents a model base 
as a metagraph, and possibly a conditional metagraph. In this representa- 
tion the models are metagraph edges and the elements in the generating 
set are the input and output variables in the models. The connectivity 
properties of metagraphs can be used to determine whether a specific 
collection of models is sufficient to calculate a set of target variables 
from a set of input variables, possibly under a set of assumptions. Of 
special interest is hierarchical modeling, in which a composite model is 
composed of several base models. For example, a composite model may 
represent manufacturing and marketing relationships extracted from two 
base models. In metagraph terms, the composite model is represented by 
combining projections of the base models. We examine the relationship 
between the sum of two or more projections and the projection of the 
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• Chapter 8, Metagraphs in Data and Rule Management, extends the 
analyses of the previous chapter to encompass metagraph edges as func- 
tional dependencies in data bases and as Horn clauses in rule bases. The 
elements in the generating set are data attributes (in the case of data 
bases) and propositions (in the case of rule bases). Metapaths represent 
inference paths from a set of source variables to a set of target variables 
under most conditions. The requirement is that there be an acyclic meta- 
path from the source elements to the target elements. In addition, meta- 
graphs can be used to uncover implicit integrity constraints in rule bases. 
Thus, metagraphs can be used to integrate data bases, decision models, 
and rules in a decision support system. 

• Chapter 9, Metagraphs in Workflow and Process Analysis, examines 
the use of metagraphs in modeling workflow support systems. In this case 
the elements in the generating represent are information elements, often 
in the form of paper or electronic documents, and the edges represent 
workstations at which document processing takes place (e.g., extraction 
of credit information from a loan application). We will be concerned with 
the decomposition of workflows (e.g., to identify candidates for outsourc- 
ing) and the synthesis of separate workflows (e.g., to consolidate inter- 
dependent processes). We will also offer comments about the impact of 
these considerations on organizational design. 

• Chapter 10, Conclusion, addresses three issues. The first is the meta- 
graph modeling process and specifically fhe life cycle of mefagraph con- 
sfrucfion and implemenfafion. The second is fhe concepf of a mefagraph 
workbench fhaf will assisf a modeler in consfrucling and implementing 
mefagraphs. The fhird is a discussion of yef anofher promising mefagraph 
applicafion area - fhe use of mefagraphs in modeling social nefworks. 
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Chapter 2 

THE ALGEBRAIC STRUCTURE OE 
METAGRAPHS 



In Chapter 1, the notion of a metagraph was introduced informally, using visual 
depictions and descriptions. In this chapter, the formal structure of a metagraph 
is defined, and its basic properties are identified. 

1. FORMAL REPRESENTATION OF A METAGRAPH 

Definition 2.1. The generating set of a metagraph is the set of elements 
X = {x\, X 2 , ■ ■ ■ , Xn}, which represent variables of interest, and which occur in 
the edges of the metagraph. 

Definition 2.2. An edge e in a metagraph is apair e = ( Vg, Wg) e E (where 
E is the set of edges) consisting of an invertex Ve C X and an outvertex VTg C 
X, each of which may contain any number of elements. The different elements 
in the invertex (outvertex) are coinputs {cooutputs) of each other. 

Definition 2.3. A metagraph S = {X, E) is then a graphical construct spec- 
ified by its generating set X and a set of edges E defined on the generating set. 

Definition 2.4. A simple path h{x, y) from an element x to an element y is 
a sequence of edges {e\ ,e 2 ,---,Cn) such that 

X G invertex{e\), 
y G outvertex(en) , and 

for all e,-, i = 1, . . . , n — 1, outvertex{ei) n invertex{ei-^.\) ^ 0. 

The coinput of x in the path (denoted coinput{x)) is the set of all other 
invertex elements in the path’s edges that are not also in the outvertex of any 
edges in the path, and the cooutput of y (denoted cooutput(y)) is the set of all 
outvertex elements other than y. The length of a simple path is the number of 
edges in the path. 

Example 2.1. The metagraph in Figure 2.1 can be represented as follows: 

S = {X, E), where 

X = {Exp, Notes, Prof, Rev, Pri, Vol, Wage}, and 
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Figure 2.1. An example metagraph. 



E = {{{Pri, Vol}, {Rev}), {{Vol, 'Wage}, {Exp}), {{Rev, Exp}, 

{Prof , Notes}) , {{Exp}, {Notes})}, 

Invertex({{Rev, Exp}, {Prof , Notes})) = {Rev, Exp}, 

Outvertex({{Rev, Exp}, {Prof, Notes})) = {Prof, Notes}, 

Coinput{Rev, {{Rev, Exp}, {Prof , Notes})) = {Exp}, 

Cooutput(Prof, {{Rev, Exp}, {Prof , Notes})) = {Notes}. 

The edges of S can be labeled, so that for example, e\ = {{Rev, Exp}, 
{Prof , Notes}) . 

Note that a single metagraph edge is a singular metagraph. Also, note that 
an edge with a singular invertex and a singular outvertex is isomorphic with 
an edge in a directed graph. 

Simple paths do not describe all of the connectivity properties of meta- 
graphs. This is illustrated in the metagraph of Figure 2.2, in which there are 
two simple paths from x\ to xs, both of which have non-null coinputs. How- 
ever, xi itself is sufficient to calculate xj, if all three edges e\,e 2 , and e^ are 
used. However, {e\, 62 , e^) does not represent a simple path, since there is no 
sequence of connected edges consisting of these edges. Rather, this metapath 
is the union of edges in two simple paths. 

Definition 2.5. Given a metagraph S = {X, E), a metapath M(B,C) from 
a source 6 c to a target C C is a set of edges E' E such that (1) each 
e' G E' is on a simple path from some element in B to some element in C, 
(2) [U.' U.' We'] c 5, and (3) C c \J^, Wp. 

There are three differences between simple paths and metapaths: 

• First, a metapath is a set of edges and not a sequence of edges. For ex- 
ample, in Figure 2.2, one metapath from xi to X 5 is M{{x\}, {X 5 }) = 
{e\,e 2 , ej}. 







The Algebraic Structure of Metagraphs 



17 




Figure 2.2. Metapath example. 

• Second, the source and target of a metapath are sets, not elements, as in 
simple paths. Of course, these sets may sometimes he singleton sets, as 
is the case in Figure 2.2 (with B = {x\} and C = {xg}). 

• Third, the notion of a coinput does not apply to a metapath, since the 
source set includes all pure inputs. 

2. THE INCIDENCE AND ADJACENCY MATRICES 

In order to define an algebra for metagraph manipulation, two matrix rep- 
resentations of a metagraph are needed. These are the adjacency matrix and 
incidence matrix, respectively. It is worth noting that as with traditional graph 
structures, each of these matrices is a complete representation of a metagraph, 
and can be derived from the other. 

Definition 2.6. The adjacency matrix A for a metagraph S = {X, E) is an 
7x7 matrix (where 7 = |X|), such that for all i, 7 e {1, . . . , 7}, 

aij = 

k 



where 

_ 1 {Vk\{xi}, Wk\{x}, (Ek)) ifxi e VkAxj e Wg, 

1 0 Otherwise. 

In other words, the adjacency matrix A of a metagraph is a square matrix 
with one row and one column for each element in the generating set X. The ijth 
element of A, denoted a, 7, is a set of triples, one for each edge e connecting x,- 
to Xj. Each triple is of the form {Clg, COg, e), in which CIg is the coinput of 
X, in e and COg is the cooutput of Xj in e. 
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For example, the adjacency matrix for the metagraph in Figure 2.3 helow is 
shown in Figure 2.4. 

There is an algebra defined for metagraph adjacency matrices. Given adja- 
cency matrices A\ and A 2 , defined for two metagraphs that have the same gen- 
erating set, these matrices can he added and multiplied with the result in each 
case being another matrix over the same generating set. Intuitively, A\ + A 2 
represents the adjacency matrix of the union of the two metagraphs, while 
A\ * A 2 represents all paths of length two, where the first edge is from the first 
metagraph and the second edge is from the second metagraph. 




Figure 2.3. An example metagraph. 
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Figure 2.4. The adjacency matrix for the metagraph in Figure 2.3. 
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Figure 2.5. Adjacency matrix of additional metagraph. 



Definition 2.7. Given a generating set X and two metagraphs 5i = {X, E\) 
and S 2 = {X, E 2 ) with adjaeency matrices A\ and A 2 respectively, then the 
sum of the two adjacency matrices is the adjacency matrix of the metagraph 
53 = {X, E\ U E 2 ) with components 

(Ai + A 2 )ij = a] jUajj. 

Note that the two matrices must he defined on the same generating set. 
However, this is not a restrictive requirement. If the generating sets of the two 
metagraphs are overlapping hut not identical, each metagraph can he defined 
over a new generating set which is the union of the two generating sets, and 
then the above definition can be applied. 

As an example, consider the metagraph in Figure 2.3 combined with a 
metagraph consisting of two edges, ee = ({^ 3 , X 4 }, {xe}) and ei = ({X 4 }, {xi}), 
which has the adjacency matrix shown in Figure 2.5. 

The result of adding the two adjacency matrices gives the adjacency matrix 
of the union of the two metagraphs, and this is shown in Figure 2.6. 

The definition of multiplication of adjacency matrices is computationally 
more complex, since the result is not an adjacency matrix, but rather a matrix 
that identifies paths of length two between elements, as mentioned above. In 
order to define this operator, a number of preliminary concepts need to be 
specified. 



Definition 2.8. The components of an ordered triple R are a{R), fi{R) and 
y{R) respectively (i.e., R = (a{R), fi(R), y(/?))). 
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Figure 2.6. The adjacency matrix for the combined metagraph. 



Definition 2.9. The operator Cat{A, B) represents the concatenation of two 
ordered lists A and B. 

For example, Cat({q, r), {q, s, t)) = {q, r, q, s, t). 

Definition 2.10. The Tmc{.) operator truncates a list when it encounters a 
duplicate element. 

For example, Trnc{{an, n= I, . . . , N)) = {un, n= I, . . . , M), where Q = 
[un, n = 1, . . . , M} is a set of distinct elements and qm+i ^ Q- 

Definition 2.11. Let X he a generating set and let two metagraphs with 
adjacency matrices A and B respectively he defined on this generating set. 
Each cell in these matrices is a list of triples, with the «th triple in aik and 
the mth triple in bkj denoted as (aik)n and {bkj)m respectively. Then the ‘o’ 
operator defines either an ordered triple or a null set, as follows: 

(1) If {{aik)n ((bkj)m + (p) then (aik)n ° {bkj)m is a triple R specified 

as follows: 

(a) a{R) = {a{{aik)n)'^a{{bkj)m))\{^{{aik)n) U {x,}), 

(b) ^{R) = {^{{aik)n) ^ ^{{bkj)m) U {XA:})\{xj}, 
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(c) y{R) = TmciCatiYiaik)n, Yibkj)m))\ 

(2) Else {ciik}n O {bkj)m — 4’- 

Definition 2.12. Given a generating set X and two metagraphs 5i = 
{X, E\) and S 2 = {X, £ 2 ) with adjacency matrices A and B respectively, let 
(aij)n and (bij)n he the ordered triples in aij and bij such that: 

aik = {(aik)n, n = l,...,N} and bgj = {(bkj)m, m = I, . . . , M]. 

Then the product of the two adjacency matrices A and B is denoted A x B 
with components 

K N M 

(A X B)ij = UUU {iflik)fi O ibkj^m)- 

k=\ n=\ m=l 



Example 2.2. Given aig = (</>, {X 4 }, e\) and bg] = {{{x2\, te), 62), ({X 4 }, (/>, 
63 )}, consider the first combination of (a,yt)i o {bkj)\- Since neither of them is 
null, we get a triple as follows: 

ot[{aik)\ o ibkj)i) = (</> U {X2})\({X4} U {vi}) = {X2}, 

^{(aik)i o (bkj)i) = {{xa} U {as} U {x 3 })\{v 6 } = {X 3 , X4, X5}, 

Y{iaik)\ o {bkj)i) = True {Cat [{ei), {02})) = {e\,e2). 



Similarly, {aik)\ o (bkj )2 = {(p , {x3 , X 4 } , (ei,P3>>- 

Using multiplication, the powers of an adjacency matrix can also be com- 
puted. The nth power of A is denoted A” . The ijth element of A” , denoted a^j , 
is a set of triples, one for each simple path h{xi,Xj) of length n connecting 
Xi to Xj. Each triple is of the form {Clh, COh, h), in which h denotes the se- 
quence of edges comprising the path, Clh is the coinput of x; in h and COh is 
the cooutput of Xj in h. The closure of A, denoted A* = A -|- A^ -|- • • •, repre- 
sents all simple paths of any length in the metagraph. The ijth element of A*, 
denoted a*j, is a set of triples, one for each simple path fi(x; , xy) of any length 
connecting x; to xj . Note that the multiplication operator allows any cycle to 
be traversed only once. Eigure 2.7 shows the closure of the adjacency matrix 
in Eigure 2.4. 

The addition and multiplication operators on adjacency matrices of meta- 
graphs also support the properties of associativity and distributivity, as shown 
below: 
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Figure 2.7. The closure of the adjacency matrix in Figure 2.4. 



Theorem 2.1. Given a generating set X and three metagraphs defined on 
this set with adjacency matrices A , B, and C respectively, then 

(1) Ax (B X C) = {Ax B) X C, 

(2) A + (B X C) = (A X C) + (S X C). 

Proof. Since the multiplication operation identifies all paths made up of 
edges in the first metagraph followed hy an edge in the second metagraph, 
A X (B X C) identifies all paths of length three consisting of an edge from A 
followed hy an edge from B and then an edge from C respectively. This is the 
same as in (A x B) x C, which proves associativity. 

To prove the distrihutive property, if Z) = (A x C) + (B x C), it suffices to 
show that for any i, j, dij = ((A + B) x C),y. In the following, the notation 

(aij)n refers to the nth triple in aij, while a", refers to the entry in the ith row 

and jth column of A”, and (a^j)m refers to the mth element of a".. 

Let \aij\ = Ml, \bij\ = M 2 , and |c,y| = N. Also, let T = A + B (i.e., 
'’'t a yij = tiij U bij. Reorganize bij so that for q < Q, (bij)q ^ a , 7 and for 
all ^ > 2, (bij)q e aij. Thus, |y, 7 | = Mi + Q. Then Xij can he partitioned into 
the following sets: 



{yij)p = (aij)mi for p = 1, . . . , Ml, 

(yij)p = (bij)p-Mi for p = (Ml + 1), . . . , (Ml + Q). 
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Then, 




— (iflik b ig) p O (cgj) n) . 

k,p,n 



Thus, D = {A + B) X C, which is the desired result. □ 

Also, note that the null matrix D (with dij = (j) V/, j) is a left and right 
identity under addition (i.e., A + D = D + A = A). This implies that the set of 
all adjacency matrices defined on the same generating set forms a commutative 
idempotent monoid under addition, while the set of all non-null adjacency 
matrices forms a semi-group under multiplication. 

Definition 2.13. The incidence matrix G of a metagraph has one row for 
each element in the generating set and one column for each edge. The //th 
component of G, gij, is —1 if xi is in the invertex of ej, it is -|-1 if xi is in the 
outvertex of ej , and it is 0 otherwise. 



The incidence matrix for the metagraph in Figure 2.3 is shown in Figure 2.8 
helow. 

Once the closure A* of a metagraph’s adjacency matrix has been con- 
structed, it can he used to identify a variety of connectivity features of that 
metagraph, as discussed in the next chapter. 

3. IDENTIFYING METAPATHS 

The adjacency matrix and its closure can he used to find paths and meta- 
paths. One of the benefits of the metagraph representation (versus simpler 
graph representations) is that searches for metapaths can be limited to only 
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Figure 2.8. The incidence matrix for the metagraph in Figure 2.3. 

those portions of the A* matrix that deals with the elements in the source and 
target sets, which can substantially reduce the search space. This is because 
every metapath from 6 to C must consist of edges based on a combination of 
triples from cells a*j such that x, G B and xj € C. Furthermore, the efficiency 
of the search procedure now becomes a function of the number of simple paths 
between B and C (each corresponding to a triple in the candidate set), rather 
than the entire closure matrix. 

Another useful observation that can be exploited is that if there is a metapath 
from B to C, then there should be triples composed of these edges in A* in 
every column j such that Xj e C. Also, in using the closure matrix to find 
metapaths M(B,C), even though there is at least one triple in every column of 
A* corresponding to elements of C, it is not always necessary to examine each 
triple explicitly, because the triples include the co-inputs and co-outputs for 
the path that they represent. Furthermore, if we use a conservative approach 
that always considers a minimal number of rows, then the metapaths obtained 
are all input-dominant. 

Definition 2.14. Given a metagraph S = {X, E), for any two sets of ele- 
ments B and C in 5, a metapath M{B, C) is said to be input-dominant if there 
is no metapath M'{B', C) such that B' c B. 

Based on the above observations, the procedure to find metapaths can be 
described as follows: 
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1. Select a candidate set of input rows / in A* such that Xi e B, Vi e I, B = 
IJ,- Xi- Start with single rows, and repeat with larger sets progressively 
in successive iterations. 

2. If 3x, G S such that af- = 0, Vi e /, then there is no metapath from 
[xi I / G /} to C. Return to step 1 and repeat with another set of rows. 

3. Find a candidate set of triples in cells a*j such that i & I, xj G C that 
forms a cover for C (where a cover for C is a set of triples T such 
that C c output (t)). If such a cover is found, then path{t) 
comprises an input dominant metapath from B{= {x; | i G /}) to C. 

4. Otherwise, return to step 1 and use an alternative candidate set I. 

The stopping criterion for the procedure (in step 3 after a metapath is found, 
or in step 1 if there are no more candidate sets) depends upon whether the 
desired outcome is one metapath or every metapath. 




Chapter 3 

CONNECTIVITY PROPERTIES OE 
METAGRAPHS 



In this chapter, we further develop the connectivity features of paths and 
metapaths introduced in Chapter 2. In particular, we introduce the notions of 
bridges, cycles and the properties of dominance. 

1. DOMINANT METAPATHS 

The property of dominance is useful in determining whether a metapath has 
any unnecessary components (edges or elements). We introduce the concept 
constructively, based on the following definitions: 

Definition 3.1. Given a metagraph S = {X, E),for any two sets of elements 
B and C in 5, a metapath M(B, C) is said to be edge-dominant if no proper 
subset of M{B, C) is also a metapath from B to C. 

Definition 3.2 (also Definition 2.14). Given a metagraph S = {X, E), for 
any two sets of elements B and C in 5, a metapath M{B, C) is said to be 
input-dominant if there is no metapath M'(B', C) such that B' c B. 

In other words, edge-dominance (input-dominance) ensures that none of the 
edges (elements) in the metapath is superfluous or dispensable. Based on these 
concepts, we can then define a dominant metapath as follows: 

Definition 3.3. Given a metagraph S = {X, £) , for any two sets of elements 
B and C in 5, a metapath M(B, C) is said to be dominant if it is both edge- 
dominant and input-dominant. 

We can illustrate these concepts using an example. Consider the metagraph 
shown in Figure 3.1, which consists of seven edges interconnecting nine ele- 
ments. 

In this metagraph, M\{{x\,X 4 },{x(,}) = {e\,e 2 ,e'i} is an edge-dominant 
metapath from {xi,X 4 } to [x^}. It is also an input-dominant metapath, since 
there is no metapath from any proper subset of {xi, V 4 } to {xe}. Thus, by de- 
finition it is a dominant metapath for that given source-target pair of sets as 
well. 
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Consider now the metapath M2({x3, X4, xg}, {xg}) = {e^}. Since it is a single 
edge, this metapath is hy definition an edge-dominant metapath (since it does 
not have any non-empty proper subset of edges). However, it is interesting to 
note that it is not an input-dominant metapath from its invertex to its outvertex. 
This is because there exists a metapath M3({x4, xg}, {xg}) = {e?, 62, e^, 64, e^}. 
Even though this metapath involves several edges, its pure inputs are a proper 
subset of the invertex of ee- This is interesting, because it illustrates that even 
a single edge could be dominated by some other set of edges, and in this sense, 
metagraphs are distinctly different from simpler graph structures. 

The algebraic representation of a metagraph in terms of its adjacency matrix 
A can be used to identify dominant metapaths. In Chapter 2 , we discussed how 
all the metapaths between two sets B and C can be found from the A and A* 
matrices. 

Definition 3.4. An edge e is said to be non-redundant in a metapath 
M(B,C) if there is some Y c C such that for every metapath M' from B 
toY,eeM'(B,Y). 

Once again, in Figure 3 . 1 , we see that e\ is non-redundant in the metapath 
Mi({xi,X4}, {x5,X6}) = {e\, 62, 6$}, since it is in every metapath from the set 
{xi , X4} to {X5}. At the same time, the same edge e\ is redundant in the meta- 
path M4({x 4, xg}, {x5, xe}) = {ei, ^2, es, 64, e?}, since the subset {xs} is now 
reachable by ej already, so that e\ is no longer needed. 
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Theorem 3.1. A metapath M(B, C) is edge-dominant iff each of its edges is 
non-redundant in M{B, C). 

Proof. Since every edge e in the metapath is non-redundant, removal of e 
will disconnect at least one of the elements in C from B. Thus, every edge in 
M is essential for the metapath, making the metapath edge-dominant. □ 

In Chapter 2, we showed how the algebraic representation of a metagraph’s 
structure can he used to find metapaths. Testing a candidate metapath for 
dominance is straightforward. Since the procedure implicitly generates input- 
dominant metapaths, only edge-dominance needs to he tested. By eliminat- 
ing each edge in turn from the metapath and testing the resulting target set 
against C, the edge-dominance of M{B, C) can he tested in time proportional 
to |M(B,C)|. 

2. CUTSETS AND BRIDGES 

The notion of a bridge addresses the extent of connectivity between sets of 
elements in a metagraph. It is defined using a relafed concepf, a cufsef. 

Definition 3.5. Given Iwo sefs of elemenfs B and C in a mefagraph S = 
{X, E), such fhaf fhere is a mefapafh M{B, C), a sef of edges E' is a cutset 
befween B and C if fhere is no mefapafh from B fo C in S' = {X, E\E')\ 
furfhermore, fhere is no proper subsef of E' fhaf is also a cufsef befween B 
and C. 

Definition 3.6. A singleton cufsef befween fwo elemenf sefs B and C is a 
bridge befween fhem. 

These concepf s are useful in many applicafions, since fhey allow designers 
and analysis to focus on edges fhaf are crilical. Note fhaf a cufsef does nol 
jusl affecl a parlicular mefapafh M(B, C) befween Ihe source and largel sefs 
B and C ; if removes every mefapafh befween fhem. If should also be easy lo 
see fhaf fhere could be multiple culsels befween any pair of sefs. Inluilively, 
any sef of edges made up of one edge from every edge dominanl mefapafh 
befween B and C comprises a cufsef befween B and C. Thus, if fhere are fwo 
edge-disjoinl melapalhs M\{B, C) and M 2 (B, C) befween B and C, Ihen Ihe 
number of culsels befween B and C is |Mi | • IM 2 I. 

On Ihe olher hand, even Ihough fhere may be many culsels befween fwo 
sefs B and C in a mefagraph, fhere may be still no bridges befween fhem. For 
inslance, in Ihe above example of fwo edge-disjoinl melapalhs befween B and 
C, fhere is clearly no bridge befween B and C. 
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We can also illustrate these concepts using our example metagraph in Fig- 
ure 3 . 1 . In that figure, we see that {x4, xg} is connected to both {x?} and {xg}. 
It turns out that there is no bridge between {x4, xg} and {xg}, and there is at 
least one metapath between them even if any single edge in the metagraph is 
removed. On the other hand, each of the edges 62,^4, and ei is a bridge be- 
tween {x4, xg} and {x?}, since the removal of any of these edges disconnects 
the source from the target. 

The following theorems provide some useful properties of bridges in meta- 
graphs: 

Theorem 3.2. Let B and C be two disjoint sets of elements in a metagraph, 
such that there is at least one metapath from B to C. If there is no bridge 
between B and C, then there are at least two edge-dominant metapaths from 
B to C. 

Proof. Since C can be reached from B via a metapath, there has to be at 
least one edge-dominant metapath, say M' , from B to C. However, none of the 
edges in M' is a bridge, so for each edge e in M', there is at least one metapath 
M"(B, C) in S' = (X, E\e). However, since M' is edge-dominant, M" (f M' . 
Since every metapath can be reduced to an edge-dominant metapath, it follows 
that M" can be reduced to an edge-dominant metapath that is distinct from M' , 
which proves the result. □ 

In the following, we use the notation B ^ C to denote that there is at least 
one metapath M{B, C) from B to C. 

Theorem 3.3. Given a bridge b between sets B and C in a metagraph: 

1 . For any subset D of X such that B ^ D and D ^ C, b is a bridge 
between either B and D or between D and C. 

2 . If M\ and M2 are two metapaths from B to C, then b G M\ fl M2 and 
thus no two metapaths from B to C are edge-disjoint. 

3. For any two sets B' , C' such that B' C', and B' ^ B , C ^ C , b is a 
bridge between B' and C . 

Proof. (Part 1 ; proof by contradiction) Assume that b is not a bridge be- 
tween either B and D or D and C. This implies that in the metagraph = 
{X, F\{b}) we still have B ^ D and D ^ C, and thus B ^ C (by the transi- 
tivity of ’). This in turn implies that b is not a bridge between B and C. 

(Part 2 ) Since is a bridge between B and C, it must occur in every meta- 
path from B to C, and thus b G M\ fl M2. 

(Part 3 ; proof by contradiction) Note that a metapath from B' to C' is also 
a metapath from B to C. Thus, if there is no bridge between B' and C' , then 
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there must be at least one metapath from B' to C' (and hence from B to C) in 
5i = {X, E\b), which implies that b is not a bridge between B and C. □ 

Since the adjacency matrix of a metagraph and its closure can be used to find 
all the metapaths M(B, C) between two sets of elements, these metapaths can 
be examined to identify bridges as well. Clearly, if any two of the metapaths 
are edge-disjoint, then by the above theorem, there is no bridge between the 
given set pair (B,C). If this is not the case, then, find the set of edges D such 
that every edge e in D occurs in every metapath M{B, C) (by comparing the 
path(?) components of every pair of metapaths). Every such edge is then a 
bridge between B and C. 




Chapter 4 

METAGRAPH TRANSFORMATIONS 

So far, we have considered a variety of features of a metagraph, where these 
features are specified in terms of the metagraph structure as given. However, 
there are many situations where it may he desirable to transform the given 
structure of a metagraph into another form that more effectively discloses 
certain structural features and/or facilitates certain analyses. In this chapter, 
we explore the transformation of a metagraph from one form to another that 
provides a different view of the system and/or relationships descrihed hy the 
metagraph. 

There are several henefits of supporting views of systems. These include 
improved focus on relevant elements and relationships among them, logical 
independence between views and their basis, customization of views for differ- 
ent users of the system and modeling tool, and information sharing at different 
levels of abstraction. 

In this chapter, we discuss three specific types of views that can be used 
to focus attention of different aspects of a large system. First, we describe the 
projection operation, which describes the relationships among a specific subset 
of the generating set of a base metagraph. Then, we discuss the pseudo-dual 
of metagraph, and finally, we discuss the element flow metagraph. 

1. HIERARCHICAL ABSTRACTION USING 
PROJECTION 

When a metagraph has a large number of elements and edges, the visual- 
ization benefits of the metagraph can be reduced by the resulting complexity, 
including the difficulty of rendering the visualization of so many relationships 
on a two-dimensional surface. In such situations, it is useful to be able to fo- 
cus attention on a smaller set of elements and the relationships between them. 
The notion of a “view” has been used in the database literature to achieve such 
an effect in a complex, multiuser database (Date, 1995). Our approach to the 
definition of a view of a metagraph is through the use of a projection of the 
metagraph into a simpler metagraph. 

Definition 4.1. Given a metagraph S = {X, E) and X' c X, a metagraph 
5' = (X', £■') is a projection of S over X' if: 
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1. For any e' = {V , W) e E' and for any x' G W there is a dominant 
metapath M{V' , {x'}) in S, and 

2. For every x' e X' , if there is any dominant metapath M(V , [x'}) in S 
with V c X', then there is an edge (V', W') G E' such that V = V' and 
x' G W'. 

3. No two edges in E' have the same invertex. 

The third condition simplifies the projection hy minimizing the number of 
edges in it; it also allows the projection to he unique. Note that a projection 
does not show all relationships between the elements in the projection’s gen- 
erating set, only those that require minimal sets of inputs for each outvertex 
element. For instance, if there are two edges ({a}, {b}) and ({a, c}, {b}) in the 
given (or base) metagraph, then only the former edge appears in a projection 
that includes all these elements. In other words, the projection identifies only 
fhe necessary sefs of elemenfs for computing each elemenf. While broader 
definilions have several merifs and if may be useful fo identify ofher views in 
some sifuafions, if is difficull fo operafionalize such broader classes of views in 
general; fhaf is, some of fhe sfrucfures fhaf resulf from such broader definitions 
can be misleading. 

We can illusfrale fhe projecfion operation by an example. Consider fhe 
mefagraph in Figure 4.1, and consider fhe projecfion over fhe subsef X' = 
[xi ,X 2 ,X(i,xj, xg} of ifs generating sef. The projecfion, which appears in Fig- 
ure 4.2, consisfs of fhree edges, each of which is a dominanf mefapafh in S, and 
fhe vertices of which are confained in X'. No elemenfs in X\X' = {xg, X4, X5} 
appear in fhe projecfion. 

The purpose of fhe projecfion is fo provide a highlevel view of fhe mefa- 
graph fhaf hides cerfain defails. The projecfion hides cerfain defails, in fhe 




Figure 4.1. An example metagraph. 
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Figure 4.2. Projection of Figure 4.1 over X' = {x\,X 2 , X(,, x-j, x^}. 



sense that if an edge e' = {V , W) appears in S' , then it is possible to reach W' 
from V' in S, hut there may he several other intermediate elements in X\X' 
that are also traversed. For example, in Figure 4.2 we can see that it is possible 
to calculate X(, given only xi , and the fact that X3 is an intermediate variable is 
hidden from the person viewing the projection, since X3 G X\X' . In addition, 
the fact that X4 is also calculated in the process (by ei) is hidden from the user, 
because X4 e X\X'. 

The advantage of a projection is that it may disclose relationships that 
are implicit in the original metagraph but are not easy to see because of 
the size and complexity of the original metagraph. The relationship, repre- 
sented by gj between x\ and x^ is one example. Another example is provided 
by e'2, which represents the invocation of a metapath M{{x\, X2}, {xi, xg}) = 
{e\,e2,e2,e4,e$}. It may not be clear from Figure 4.1 that this is a dominant 
metapath for the calculation of both LO and TL. The third edge in the projec- 
tion, gg, is easily discernible from the original metagraph, since it is simply es. 

Because each edge in a projection corresponds to one or more metapaths 
in the original metagraph, it is useful to establish the set or sets of edges in 
the original metagraph that corresponds to each edge in the projection. This is 
called the composition of the projected edge and is defined as follows: 

Definition 4 . 2 . Given a metagraph S = {X, E) and its projection S' = 
{X', E') along X' c X, the composition C(e') of an edge e' G E' is the set 
of metapaths in E that correspond to e' . 

Thus, in Figures 4.1 and 4 . 2 , C(e'^) = {{01,02}}, CCe^) = {{^1,^2, ^3, 04, 
es}} and C(o't^) = {{es}}. We note that a composition is not a set of edges, 
but a set of sets of edges, because there may be more than one metapath in S 
corresponding to an edge in S'. 
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An interesting property of projection and composition as we have defined 
them is that they are unique and unambiguous, as stated in the following theo- 
rem (from Basu, Blanning and Shtuh, 1997): 

Theorem 4.1. The projection of a metagraph along a given subset of its 
generating set and the composition of each of its edges is unique. 

Proof. Let S = {X, E) he the metagraph and let S' , S" he two projections 
of S over X', with e' = (V, W) G E' and e' ^ E" . Since e' G E' , there ex- 
ists a dominant metapath in S from V' to W' . Since e' ^ E" and S" is also 
a projection of S, there must he an edge e" = (V", W") ^ E" with either 
V' C V" and W' c W" or V' c V" and W' C W" (which follows from Defin- 
ition 4.5). However, if V' C V" , then e" violates condition 1 of Definition 4.6 
for all X G W' , and if W' C W" , then e' violates part 3 in Definition 4.6. Thus, 
V' = V" and W' = W" , which implies that e' = e" , a contradiction that proves 
our result. 

Now consider any e' = {V ,W) G E' . There is at least one metapath 
M(V', W) in S from V' c X' to W c X' . Consider the set of all such meta- 
paths {M, (V', W), M 2 (V' , W ), . . .}. Since there can he only one such set, 
C(e') is unique. Therefore, for each e', C(e') is unique. □ 

An interesting property of projections that does not apply to metagraphs 
in general, is that in a projection, there can he no simple paths of any length 
between two elements unless there is an edge in the projection connecting the 
elements. The following theorem (also from Basu, Blanning and Shtuh, 1997) 
states this formally: 

Theorem 4.2. Given a metagraph S = (X, E) with adjacency matrix A' and 
closure A*, and its projection S = (X', E') over X' ^X with adjacency ma- 
trix A' and closure A'*, a'- - = f iff a'f = ffor any x,- , x ; G X' . 

Proof. First, we show that if a'f = 4>, then a'-- = f. This follows since 
a'-, c a'* 

Next we prove by contradiction that if a'- - = f, then a'f = f. Let i, j be 
such that a'- - =<p, but a'f f. Then there must be at least one simple path, p, 
from Xi to Xj in S' consisting of a sequence of edges (eL, . . . , with each 
such e'-j e E' , k = i, . . . , K . The coinputs of x, in this path are all elements 
of X'. Thus, there is a metapath from x; U {x | x is a coinput of x,- in p)} to x,- 
in S' (and also in 5). From the definition of projection (Definition 4.6), if there 
is a metapath in S from some F c A' to some W c X', then a'^^ f for 
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all Xm ^ V and G W. Since x,- U {x | x is a coinput of x; in p)} c X' and 
Xi G X', it follows that a'- ^ (p which contradicts the assumption. □ 

This theorem illustrates that in the case of the projection of a metagraph, any 
questions of reachability in the projection metagraph can he answered using 
the A' matrix itself (which is based on the A* matrix for the base metagraph). 
The reason is that any connectivity in the base metagraph between elements 
of X' is captured in some edge in the projection. Thus, there is no need to 
consider paths of length greater than one in the projection, and the closure of 
the A' matrix does not have to be computed. 

In summary, we have shown that using projection, a metagraph can be trans- 
formed into a higher-level view in which certain elements and relationships are 
retained and others are deliberately hidden from the user. Also, several views 
may be constructed from a single metagraph, by selecting different elements 
in A'. 

We next show how to construct the projection of a metagraph over a given 
set of elements, using the A* matrix. As indicated by part 2 of the definition of 
a projection, this requires the identification of all dominant metapaths between 
these elements. A brute force approach is to examine all combination of triples 
in A*, but this could be extremely inefficient, because it involves examining 
2^ combinations of triples, where 6 is the number of triples in A*. A more 
efficient procedure can be built using the following observations about the A* 
matrix: 

• Any metapath between elements in X' must be composed of paths corre- 
sponding to triples in af- such that both x, , x; G X'. 

IJ J 

• Any triple in af- with x,-,x; G X' such that all the coinputs are in X', 

IJ J 

corresponds to a valid metapath between elements in X'. 

The value of these observations is that they allow us to reduce the num- 
ber of triples in the A* matrix that need to be considered. In the procedure 
Projbuild below, the projection is obtained by first building a set L, each el- 
ement of which is a set of triples from A* that comprises a candidate edge in 
the projection. Then, L is reduced by identifying dominant metapaths, leading 
to a set Lq in which each element / corresponds to an edge in the projection. 
The composition of / is a collection of edge-sets, each of which corresponds 
to one or more sets of triples in L. 

Procedure Projbuild (S, X') 



1. Reduce A* by eliminating all rows and columns corresponding to ele- 
ments that are not in X'. Let the resulting matrix be Aq. 
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2. Using Ag, identify all edges e e £ such that I 4 c X', and create a set L 
of all such edges (since e corresponds to a candidate edge in the projec- 
tion). 

3. For each comhination of triples from Aq, if (J,- coin(t/)\ 

(Jj- coout(t, ) c X' , then add the set of edges corresponding to this set 
of triples as an element of L. 

4. Construct the set Lq from L hy taking each element / of L (each element 
of L is a set of edges in the base metagraph), and augmenting it with its 
net inputs, and its outputs (i.e., each element of Lq is a triple ({net inputs 
of/}, (outputs of/}, /)). 

5. Reduce Lq to identify dominant metapaths, as follows: 

(a) Eliminate each triple / G Lq that is subsumed by at least one other 
triple (i.e., / is subsumed by j if the latter corresponds to a subset of 
edges in i and the outputs of j that are in X' include all such outputs 
of /); 

(b) For any i & Lq such that 3jeLo,j^i and both the outputs and 
inputs of j are subsets of those of / , eliminate all the elements in the 
output of / that are also outputs of j (since the edges in i are not 
a dominant metapath for those elements); if there are no remaining 
outputs in the triple, drop the triple. 

6. Each triple in Lq now corresponds to a dominant metapath between ele- 
ments in X'. For each set of two or more triples with the same inputs and 
outputs, combine them into a single triple whose third component (a set, 
each element of which is the third component of the original triples) is 
the composition of the corresponding projection edge. 

7. If there are still multiple triples having the same input, in order to satisfy 
part 3 of Definition 4.6, combine these into a single triple whose output 
is the union of all the outputs of the component triples; the composition 
of the corresponding projection edge e' is formed by taking one edge-set 
from the composition (edges) of each component triple, and computing 
the union of these edge- sets. 

8. Return Lq. 



It can be shown that this procedure Projbuild always terminates with a 
valid projection of the metagraph on the specified element set. The following 
theorem states this formally: 

Theorem 4.3. Procedure Projbuild always terminates with a valid projec- 
tion of the metagraph on the specified element set. 

Proof. The procedure examines all relevant combinations of triples in A* 
that could form metapaths between any pair of vertices in X' . Thus, this pro- 
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cedure finds all valid metapaths within X' . Furthermore, since Qq is finite, this 
procedure always terminates. 

To show that the procedure correctly forms the projection, first assume 
that there is an edge e in the projection that is not identified by the proce- 
dure. Clearly, e corresponds to at least one dominant metapath between V (e) 
and W(e). However, since Projbuild finds all metapaths between elements 
in X',e must be included in L. Thus the only possibility is that e is dropped 
from Lq at some stage. However, none of the reductions in Step 6 removes a 
triple that is essential for the projection, so e cannot be eliminated. Thus we 
have a contradiction. 

Now assume that e is identified by the procedure, but is not in the projection. 
Clearly, because every element in L corresponds to a valid metapath within X' , 
the only possibility is that e is not dominant. However, Step 6 eliminates all 
such metapaths, and Steps 7 and 8 merge all metapaths with common inputs. 
Thus e must be in the projection, which is a contradiction. □ 

Furthermore, the computational complexity of the procedure is quite man- 
ageable in most cases. As mentioned earlier, a naive approach would be to 
find all metapaths within the detailed metagraph, and then eliminate all those 
having inputs outside X' . In Projbuild, the first step eliminates a possibly 
large number of triples from contention. To appreciate this, note that even if 
the detailed metagraph is on a large generating set, most projections (if they 
are to serve a useful purpose for visualization and planning) are over a rela- 
tively small set; otherwise the projection would be very large and cluttered, 
and thus not much better than the detailed metagraph itself. Beyond this, the 
procedure involves examination of all combinations of the remaining triples, 
the complexity of which is exponential in the number of triples. Thus, the com- 
putational complexity of the procedure depends primarily upon the density of 
the relevant portion of the base metagraph (i.e., the number of triples in Aq), 
which may not be large even if the overall metagraph is very large. 

So far, we have considered views of a single metagraphs. In general, how- 
ever, there may be multiple related metagraphs in a given problem context. We 
have discussed the combination of metagraphs in an earlier chapter. We now 
examine whether views of multiple metagraphs can also be combined. 

Consider a situation where there are two distinct metagraphs over possibly 
overlapping generating sets. If the metagraphs are S\ = {Xi, Ei) and S2 = 
{X2, E2), then the new metagraph, which we will call the sum of Si and S2, 
will be S12 = Si + S2 = {Xi U X2, Ei U E2). If Zi n A2 7^ (j), then S12 may 
contain simple paths and metapaths that are not entirely within either Si or S2. 
To see this, consider two projections Sj and S'2 of Si and S2 respectively, 
where Sj = (A^, E'^) is the projection of Si over X'^ ^ Ai and 82 = {X'2, E 2) 
is the projection of S2 over X\ c X2. If we combine the two views, we get a 
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metagraph + 52 = {X'^ U E'^ U E' 2 ). An interesting question is whether 
any information about relationships between elements of X\ U that existed 
in 5 i 2 (and thus in the projection of 5i2 over U A^) are lost in this process. 
That is, we would like to know whether 5j + 5^ contains the same information 
as the metagraph 5j2 = {X\ U A^, £’ 12 ) > which is the projection of 5i2 over 

a;ua'. 

In order to do this we first elaborate on the notion of dominance discussed 
in a previous chapter, as follows: 

Definition 4.3. A metapath M{B,C) in a metagraph dominates another 
metapath C), if B c 5' and C c C. 

Definition 4.4. A metagraph S dominates another metagraph S' if every 
metapath in S' is dominated by some metapath in S. 

Using these concepts, the relationship between 5^2 and 5^ + 5^ can be char- 
acterized in terms of the following theorem: 

Theorem 4.4. Consider two metagraphs 5i = (Ai, £ 1 ) and S 2 = (A 2 , £ 2 ), 
along with their sum 5i2 = (Ai U A 2 , £1 U £ 2 ). Eor some Aj C X\ and 
X '2 ^ X 2 , let = (Aj, £j) and S '2 = (A^, E' 2 ) be projections of S\ and S 2 , re- 
spectively. Also, let 5j 2 = (Aj U A^, £^ 2 ) be a projection ofSu over Aj U A^. 
Then 5^2 dominates 5^ + 5^. 

Proof. We show that is dominated by 5^2- Consider an edge e = {V ,W) & 
£j. Then there is a dominant metapath M{V, W) = {e\, . . . , c E\ in 5i. 
Thus M c u £ 2 , and U, IT c A c (Ai U A 2 ), and thus M is also a metapath 
in 5 i 2. Thus, there is an edge e* = (Ui, Wi) in 5^ that dominates e (that is, 
Vi c y and W c Wi). 

Similarly, S '2 is also dominated by 5^2, and thus + S '2 is dominated by 
5 j 2> and the result follows. □ 

Note that the converse does not have to hold. That is, there may be some 
edges in 5i2 that are not dominated by any edges in + S^. To illustrate this, 
consider the metagraph in Figure 4.3 as S 2 . 

Now, considering Figure 4.1 as Si and Figure 4.2 as S 2 , we can construct 
the following related metagraphs: 

1. We combine the two metagraphs to get S 12 = Si + S 2 . Then we project 
S 12 over the set Aj U A^ = {x\,X 2 , X 4 , X 5 , ve, xj, x%, xg, X 13 } to get 5 ^ 2 - 
This is shown in Figure 4.4. 
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Figure 4.3. A second metagraph 52- 




Figure 4.4. Projection 5j2 of the joint metagraph. 



2. We project 5i over the set = [x\ , X2, x^, xj, xg}, to get and project 
S2 over the set = {xg, X2, X4, X5, X13} to get Then we comhine 
these two projections as + 5'2- This is shown in Figure 4.5. 

We can see from Figures 4.4 and 4.5 that 5j2 dominates + 82- For ex- 
ample, some edges, such as ({x 6 ,X 7 }, {xg}), appear in both 8^2 and + 5^. 
It appears as in 5^2 and e'^ in 5^ + ^ 2 . In this case the compositions of 
the relevant edges are the same: C{e\) = Cie'^) = {{^ 5 }}- On the other hand, 
e'^ = ({x2, X9, X13}, {X5}) in + 82 does not appear in 8^2, but it is dominated 
hy the edge e'g = ({X 2 }, {X 5 ,X 4 }) in 5^2- In addition, there are edges in 5^2 
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that do not dominate any edge in An example is e\ = ({xi}, {xe, X4}). 

Although the edge e\ = ({xi}, {xe}) is found in + S'2, there is no edge in 
5 j + 52 with x\ in its invertex and X4 in its outvertex. 

A possible negative consequence of integrating high-level views of a model 
base is that information may be lost. For instance, in the metagraph 5 j + 5 ^, we 
find that the fact that X4 is reachable from xi in the combined metagraph 5i2 
is lost. This leads to the question of whether there are any conditions un- 
der which no information is lost in projecting two metagraphs and summing 
the projections (i.e., other than the trivial cases where X'-^ = X\, X'2 = X2 or 
Xl(^X2 = ^). In other words, we have seen that every metapath in 5 j -|- 5 ^ is 
dominated by a metapath in 5^2- The question then can be stated as whether 
there are any conditions under which the converse is true as well - that is, 
every metapath in 5j2 is also dominated by a metapath in 5 ^ -|- 5 ^. To do this 
we first establish a term for mutual dominance. 

Definition 4 . 5 . Two metagraphs 5 i and S2 are equivalent if they each dom- 
inate the other. 

Note that equivalence is not the same as equality. That is - the edges in the 
equivalent metagraphs need not be the same. The difference will be illustrated 
below. 

A sufficient condition for equivalence of 5'j2 and 5 j -|- 5 ^ is that the in- 
tersection of the generating sets is contained within the intersection of the 
projection sets - that is, X\ D X2 ^ X'^ n Because it is always true that 
Aj n A2 c Ai n A2 this condition is the same as Aj n A^ = Ai n A2. The 
following theorem states this formally: 
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Theorem 4 . 5 . Consider two metagraphs 5 i = (Xi, Ei) and S2 = {X2, £’2). 
along with their sum S \2 = (Xj U X^, £j U £ 2 ). For some Xj c X\ and 
X'2 c X2, with Xi n X2 = X'j n X^, let 5 ' = {X[,E'^) and S'^ = (X^, E'^) 
be projections of S\, and S2, respectively. Also, let 5^2 = (Xj U X^, £^ 2 ) 
projection of S 12 over Xj U X^. Then 5^2 is equivalent to 5^ + 5^. 

Proof. From Theorem 4 . 4 , we know that 5i2 dominates + 82. We now 
show that under the stated condition, for any edge in 5i2, there is a metapath 
in + ^2 that dominates it. 

Let M(A, B) he a metapath from A to £ in 5i2, such that M is in the 
composition of some edge {e') in 5i2 from A to S. Partition M into two sets 
of edges M\ and M2 such that M\ c E\ and M2 ^ £2- 

Consider M\ . Since M is a metapath between two sets of elements in X^ U 
X2, every element v in netin(Mi) must he either ( 1 ) in Xj, or ( 2 ) in X\ fl X2. 
The reason for this is that every net input of M is in Xj U X^, and thus, all 
those elements in netin(Mi) that are not themselves in Xj must he reachahle 
from Xj U X2 using edges in M2. But the only elements in all such edges that 
are also in Xi are those in X\ fl X2. However, hy the condition stated in the 
theorem, this implies that all the elements in case (2) above are in Xj fl X^. 
This, together with case ( 1 ), implies that all the elements in netin(Mi) are 
in Xj. Similarly, all the elements in netin(M2) are in X^. 

Every element in B must be in the outvertex of some edge in M\, M2, or 
both. Partition B into B\ and B2 as follows: 

B\ = Bc( U wA, B2 = Bc( IJ Wj 

V;6Mi ^ \jeM 2 

By definition, M\ is a metapath in S\ from netin(Mi) to Ue,6Mi follows 
that Ml is a metapath from netin(Mi) to B, in Si, and since both the source 
and target of the metapath are in Xj, there is an edge in Sj between these 
two sets as well. Similarly for M2 and B2 in S2. Thus, in Sj + S^, there is a 
metapath from A to B composed of edges in M\ U M2 = M. Because this is 
true for every metapath M in Sj2> the result follows. □ 

The condition Xi n X2 = X^ n X^, did not hold in the previous example, 
because Xi n X2 = {x2, X4, V5} but X^ n X2 = {X2}. Thus there were elements 
common to the generating sets of metagraphs that were not common to the 
sets of elements over which the two metagraphs were projected. 

By changing the elements over which the projections are constructed, we 
can illustrate what happens when the condition is satisfied. For instance, if we 
project Si over a new X^ = {x\ , X2, X4, X5, v?}, we get a new X^2 = Xi n X2 = 
Xj n X2 = {xi , X4, X5} - that is, all of the elements that are found in the union 




44 



A. Basu and R. W. Blanning 



of the generating sets of 5 i and S2 are also in the sets of elements over which 
5 i and S2 are projected. The union of the sets over which the metagraphs are 
projected is X'^ VJ X^ = {x\,X2, X4, X5, xy, V9, X13}. In this case, as before, 5^2 
dominates + 52; however,the converse is also true, i.e., + 5 ^ domina- 

tes 5 j2- 

It is also important to note that equivalence (i.e., mutual dominance) 
does not necessarily imply equality. The metagraphs in Figures 4.6 and 4.7 
are clearly not the same, in part because of the existence of edge e'^ = 
({x2, X9, X13}, {X5}) in Figure 4.7. However, this edge does not destroy the 




Figure 4.6. The second projection of S \2 over [xi , xy, X4, X5, xy). 




Figure 4.7. Sum of the projections in Figures 4.3 and 4.6. 
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equivalence of 5j2 and 5^ + 5^, since it is dominated by in Figure 4.6. 
In addition, the edge e'^ = {{x\ , X 2 }, {xi}) in Figure 4.7 does not appear in 5^2, 
but it is dominated by the edge in Figure 4.6. 

Although 5 j 2 (Figure 4.6) and + 5^ (Figure 4.7) are equivalent, there 
is a difference between them: 5^2 seems simpler than 5^ + 5^. One reason 
for this is that a projection such as S '^2 only contains dominant metapaths, 
whereas the sum of two projections such as + 5^, need not. For example, 
in Figure 7.17, the edge e'^ = {{x 2 , X 9 , X13}, {X5}) is in + 5^, but not in 5^2, 
because it is dominated by e'j 2 - Another reason for the simplicity of 5^2 is the 
requirement that no two edges can have the same invertex. Thus, edge <6 in 
5 j 2 corresponds to three edges - e'^, e^ 2 , and - in + 5^. As before, there 
is no such simplifying requirement for the sum of two projections, only for a 
single projection. 

In summary, we have seen that there is a simple criterion for the integration 
of two views that avoids the misleading impression that a calculation cannot 
be performed (e.g., that expected service life cannot be calculated from the 
design variables) when in fact the calculation can be performed. The require- 
ment is that all variables common to the two sets of calculations (i.e., the life 
cycle costing calculations and the cost estimating relationships) also be in both 
of the sets of variables used to construct the higher-level views. We have also 
seen that two views can be equivalent without being identical. In our second 
example, the sum of the higherlevel views of the life cycle costing calcula- 
tions and of the cost estimating relationships was equivalent to a single view 
constructed directly from both sets of calculations. 

At the same time, the ways in which these relationships were presented to 
the user were different in two respects. First, the sum of the high-level views 
contained redundant information, in the form of dominated metapaths. For ex- 
ample, the sum of the views disclosed that miles driven, fuel cost, and annual 
cost of preventive maintenance are sufficient to calculate annual operation and 
support cost, but it also disclosed that miles driven is sufficient to calculate an- 
nual operation and support cost. The second difference is that several relation- 
ships in the sum of the higher-level views may appear as only one relationship 
in the direct view of both sets of calculations. For example, the sum of views 
contained separate relationships between miles driven and service life, miles 
driven and annual operations and service cost, and miles driven and life cycle 
operation and support cost. In the direct view this is presented to the users as 
a single relationship: miles driven is sufficient to calculate all three variables. 
Thus, the users can easily see that several of the variables of interest to them 
are determined by a single variable, miles driven. 

In cases where X\ = X 2 = X (i.e., the two base metagraphs are defined on 
the same generating set). Theorem 4.5 is not very useful. However, the follow- 
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ing condition can still be used to check the base metagraphs before combining 
their projections: 

Corollary. Consider all elements y e [(Xi n X2)\(Xj n X^)]. Then, if 
there are no two distinct elements z\, Zi ^ X 2 ) such that there is a sim- 

ple path from z, to y as well as a simple path from y to Z2 in Si U S 2 , then S '^2 
is equivalent to 

The proof of this corollary to is essentially the same as that for Theorem 4.5, 
because under the stated condition, all the elements in netin(Mi) are still either 
in Xj (or in X'^ n and thus in X'f). 

2. THE INVERSE METAGRAPH 

The inverse metagraph is a representation in which the generating set is 
made up of edges from the original metagraph, and where the edges corre- 
spond to combinations of elements from the original metagraph’s generating 
set. Thus, the inverse metagraph is intuitively like the dual of a simple or di- 
rected graph. As with these simple graph constructs, one could construct a dual 
metagraph by simply transposing the incidence matrix of the original meta- 
graph. However, the semantics of the resulting dual structure are different (in 
that the invertex of each edge represents a disjunction, rather than a conjunc- 
tion, as in the primal metagraph). That is why we define the inverse metagraph, 
which provides the necessary “edge-centered” representation while retaining 
the conjunctive form of the edges. 

Definition 4.6. Given a metagraph S = {X, E), its inverse T = {X', E') is 
a metagraph such that X' = E C {a, f}, where a denotes the external source, 
f denotes the external target, and e' G E' iff in the primal metagraph, the 
primal elements corresponding to e' are in the outvertex of the primal edges 
in Vg/ and also in the invertex of the primal edges in Wgi. In addition (1) all 
pure inputs (i.e., elements that are not in any outvertex) in the primal appear 
in the inverse metagraph as edges from a, and (2) all pure primal outputs (i.e., 
elements that are not in any invertex) appear in the inverse metagraph as edges 
to f. 

For example, the inverse of the metagraph in Figure 4.8 is the metagraph 
in Figure 4.9. Since both representations are metagraphs, properties such as 
paths, metapaths, cycles and bridges can be applied to the inverse metagraph 
just as to the primal, and the same algebraic operations and procedures can be 
applied to both. Thus, the inverse metagraph provides a complementary visual 
representation of a system that still supports metagraph analysis. 
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Figure 4.8. A primal metagraph. 




The inverse of a given metagraph can he generated from its incidence ma- 
trix G. This is a matrix whose rows correspond to the elements in the meta- 
graph’s generating set and whose columns correspond to the edges in the meta- 
graph. There is a “-|-1” (resp., “—1”) entry whenever the row element is an 
output (resp., input) of the edge corresponding to the column; all other entries 
are null. 

The inverse metagraph can he constructed using the following procedure: 

Procedure Inverse 



1. For each column j of G, form all comhinations of columns k such that 
gij = — 1 and gif^ = + \, selecting no more than one column from each 
such row, and create an edge with each of the “-|-1” column indices in its 
invertex and each of the “—1” column indices in its outvertex. Label the 
edge with the set of row-column index pairs used to construct the edge 
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(e.g., if entries gij and g^n are used for the invertices, and column p is 
used for the outvertex of edge y, then label (y) = [{xi, ej), {xm, e„)}). 

2. If a set of two or more edges have the same invertices and labels, then 
replace all these edges with a single edge having the same invertex and 
label, and the union of all the component outvertices in the outvertex. 

3. For each row j in G that has only —1 (resp., +1) entries, create a single 
edge to (resp., from) the column entries from (resp., to) a (resp., /I), with 
the label {{xi,a), {xj, ek)}, for each column ek with a “+ 1 ”. 



For the metagraph in Figure 4.1, the matrix G is as follows: 



G 


e\ 


e2 


«3 


r?4 


XI 


-1 


-1 


0 


+ 1 


X2 


-1 


0 


0 


0 


X3 


+ 1 


0 


-1 


0 


X4 


+ 1 


0 


-1 


0 


AS 


0 


+1 


-1 


0 


X6 


0 


0 


+1 


-1 


XI 


0 


0 


0 


-1 


X8 


0 


0 


+1 


0 



and the algorithm would proceed as follows: 

1. The four edges {{e\],{e\]) (with label {x\,e^{)),{{e\],{e 2 ]) (with la- 
bel (x 2 , e 4 )), ({ei, £ 2 }, {« 3 }) (with label {x-i,e\),{x 4 .,e\),{x$,e 2 )) and 
({ea}, {^ 4 }) (with label {x(,, e^,)), are identified. 

2. The edges {a,{e\]) (with label (x 2 ,a)), {a,{e\]) (with label (x-j,a)), 
and {{e\}, /I) (with label (xs, 63 )) are added. 

The result is the inverse metagraph in Figure 4.2. Note that the inverse is 
equivalent to the original metagraph, in that the latter can be reconstructed 
given the former. 

3. THE ELEMENT FLOW METAGRAPH 

Another operation, one that focuses attention on the flow of particular pri- 
mal elements, is the transformation of a primal metagraph into its correspond- 
ing element flow metagraph (EFM). We first define the EFM, provide some 
intuition on its structure, specify an algorithm for generating it for a given 
metagraph, and then illustrate the algorithm with an example. 

Definition 4.7. Given a metagraph S = {X, E) and a specific subset X' 
of X, the element flow metagraph corresponding to X' is a metagraph S = 
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{X' , F) in which for each edge f = (Vf,Wf) e F, there exist edges e\,e 2 ^ E 
such that 

1 . VfQVe,, 

2 . Wei ^^€2 = ^ 

3. Wf^Ve^. 

The set of elements Z is the flow content on / through the (primal) edge 
pair ei, C 2 from Vf to Wf. Since there could he several edge pairs e\,e 2 ^ E 
corresponding to the same f e F, we also define the y?ow composition C(f) 
as the set of all such edge pairs representing flow on /. 

Informally, this transformation is a simplified form of the primal in which 
the generating set consists of X', and each edge e' identifies what elements 
in X\X' flow from vertices in S containing the elements in Vg/ to elements 
in vertices containing the elements in Wg'. Thus, the edges represent a depen- 
dency between the elements in X'. However, this dependency is not of the 
type represented hy the projection operator, which identifies metapaths be- 
tween vertices. An edge e' in the EFM represents the fact that there is some 
edge in S whose invertex contains the elements in Wg', and which requires 
input from some edge whose invertex contains the elements in Vg/ . 

For example, the EFM defined for the elements X 2 , X 4 and x-j from the meta- 
graph in Figure 4.8 is shown in Figure 4.10. The composition of each edge 
is shown in the legend, where the content of each edge pair is included in 
parentheses. Once again, as with the inverse, the EFM can be analyzed using 
metagraph analytical procedures. For example, cycles can be identified and 
analyzed, metapaths can be identified to identify the scope of the impact of 
one set of resources upon another, and bridges can be used to identify critical 
element flows. However, since the edges in the EFM only represent direct ele- 
ment flows (across a single edge in the original metagraph), transitive element 
flows are captured only when the set X' covers all the edges in the original 
metagraph (in the sense that for e G F, A' n Vg 7 ^ 0 ). 




Figure 4.10. The element flow metagraph for Figure 4.9 with X' — [xj, X4, xjj. 





50 



A. Basu and R. W. Blanning 



While the EFM could he used for any subset of X, it is particularly useful 
when X' is a separate type of element. For instance, in Chapter 7 we will show 
how this transformation is useful when X' represents a set of resources. It 
directly provides a view that represents the interaction between resources. 

The EFM can be generated from a given metagraph by the following Proce- 
dure EFM, which uses the element task incidence matrix G. However, before 
describing this procedure, we need to define a new operator. 

Definition 4.8. Fet A be an m x n matrix with row indices 
and column indices yi, . . . , y„. Fet B be an n x p matrix with row elements 
y\, ... ,y„ and column indices z\, ... ,Zp. Then the operation A® B results in 
an m X p matrix C with row indices x\, . . . ,Xm and column indices z\, . . . ,Zp, 
such that 

Cij = Clik © t>kj 

k=l,...,n 



and 



^ik © 



yk if aik = I and bkj = - 1 , 
-yk if atk = - 1 and bkj = -I, 
0 otherwise. 



Now, let Gi be the sub-matrix of G corresponding to the elements in X' 
and G 2 be the sub-matrix of G corresponding to elements in X\X' that are not 
either pure inputs or pure outputs. Then the procedure is defined as follows: 



Procedure EFM 



1. Perform the operation G 2 <8> G]^ (GJ is the transpose of Gi); the result 
is a matrix B whose rows correspond to the non-terminal elements in 
X\X^ and whose columns correspond to the elements in X\ 

2. For each row r, of B corresponding to element v, , construct edges as 
follows: 

(a) for each edge G (resp., tb) appearing as a positive (resp., negative) 
entry in any cells in r; , create an invertex (resp., outvertex) consisting 
of all column indexes corresponding to the columns Zj such that 
ta G Tij (resp., -tb G r,y); 

(b) combine each such invertex and outvertex pair as an edge. The flow 
composition of the edge is defined as {r,- (G, G)}; 

(c) combine all edges with the same vertices into a single edge whose 
flow composition is the union of the flow composition of the com- 
ponent edges. 
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The procedure can be applied to the metagraph in Figure 4.8 to generate the 
EFM in Figure 4.10 for X' = {x 2 , x^,x-i} as follows: 

Step 1 : Based on the partitioning of the incidence matrix with X 2 , X 4 , xi for 
G\ and x\,xj,,x$,x(, for Gi, the R matrix is as follows: 



R 


X2 


X4 


X? 


XI 


-e\ 


0 


+^4 


■*3 


+ei 


-£3 


0 


■3:5 


0 


-£3 


0 


X6 


0 


+e3 


-r?4 



Step 2: From the row for xi, we get the edge e' = (x?, X 2 ) with flow com- 
position {xi(e 4 , e\)}', similarly, we get the edges e” and e'” from rows X 3 and 
X 6 , respectively. 

Note that there is no edge corresponding to the element X5. This is because 
the edge that produces that element is not covered by X' . This illustrates why 
the covering condition stated earlier is important in order for an EFM to cap- 
ture all element flows, both direct and indirect, among the elements of X' . 





Chapter 5 

ATTRIBUTED METAGRAPHS 

As described thus far, metagraph edges are set-to-set mappings with no further 
information attached. However, it is possible to attach attributes to metagraph 
edges. In this chapter, we examine how both qualitative and quantitative at- 
tributes can be added to metagraph edges. 

1. QUALITATIVE ATTRIBUTES 

A qualitative attribute is essentially a label that is added to each metagraph 
edge, in addition to the edge identifier (name) label. An example of such an 
attribute is a color, like ‘Blue’ or ‘Red’. Since such attributes are only labels, 
they cannot be used in arithmetic operations. However, they can still be very 
useful in algebraic analysis of metagraph structure. At the simplest level, they 
can be used to partition the metagraph (e.g., into edges of the same color or 
type). They can also be used to constrain path or metapath selection. 

The idea of adding qualitative attributes to edges is also well-established in 
traditional graphs and digraphs. However, the richer structure of a metagraph 
provides an interesting additional dimension, namely the separation of the vi- 
sual and algebraic representation of such attributes. To see this, consider the 
examples in Figures 5.1. Both Figures 5.1(a) and 5.1(b) represent the same 
metagraph, with the edges assigned a color attribute. The first representation, 
in Figure 5.1(a), is visually more informative, and also familiar to users of 
attributed graphs and digraphs. On the other hand, the representation in Fig- 
ure 5.1(b) shows each color attribute as an additional element of the generating 
set, and then included in the invertex of the edges to which it applies. 

There are two important implications of this feature in metagraphs. First, 
while the edge representation is easier to interpret visually, the advantage of 
the vertex representation is that each attribute value needs to appear only once, 
and all edges with that attribute value are immediately identifiable (since that 
value is in their invertex). So in a large metagraph, finding all the blue edges 
is easy in the vertex representation. Of course, it may also be very difficult to 
draw this metagraph on a planar surface, once it has a large number of edges. 

The second implication is that due to the semantic equivalence of the two 
representations, it is possible to use the edge representation in graphical vi- 
sualization of attributed metagraphs, while using the vertex representation for 
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Figure 5.1. Edge (a) and vertex (b) representation of qualitative attributes. 



analysis. The significance of this is that in the vertex representation, the at- 
tributes are treated simply as a suhtype of element from the generating set. 
Thus, attributes are included in the adjacency matrix, and can be subject to 
the same types of analysis as any other element. In other words, the addition 
of qualitative attributes does not force any significant change to the basic 
algebraic constructs and operations in metagraphs, unlike other graph repre- 
sentations. 
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2. QUANTITATIVE ATTRIBUTES 

It is also possible to attach quantitative (numerical) attributes to metagraph 
edges. The purpose of this would be to allow certain calculations to be per- 
formed. For example, if the attributes are the costs of the tasks represented 
by the edges, then these attributes can be used to determine the total cost of 
the tasks appearing in a workflow. If the attributes represent the durations of 
the tasks, then they can be used to calculate the duration of the workflow in 
a fashion similar to the PERT/CPM calculations used in project management. 
If the attributes represent measures of performance, such as degrees of relia- 
bility or probabilities of non-failure, then they can be used to determine the 
performance of the workflow. 

These attributes might also be combined. For example, if certain numeri- 
cal attributes represent both time (i.e., activity durations) and cost, then these 
attributes might be combined to perform time/cost tradeoffs. If they represent 
either cost or duration along with probability of non-failure, then they might 
be used to determine the probability distributions of workflow cost or duration, 
depending on what will be done if a task represented by an edge should fail. 
In this paper we will focus on deterministic activity durations, and we will not 
consider time/cost tradeoffs. 

3. CONDITIONAL METAGRAPHS 

One particular type of qualitative attribute in metagraphs that is used in a 
number of applications is a general proposition or proposition. A proposition 
is a statement that may be either true or false. If a proposition appears in the 
invertex of an edge, it must be true for the edge to be used in a metapath. 
Each edge may contain zero, one, or more assumptions, and each assump- 
tion may appear in one or more edges. Propositions appear in the generating 
set along with elements representing other types of variables. We distinguish 
metagraphs that contain propositions by the term conditional metagraph. We 
define a conditional metagraph as follows. 

Definition 5.1. A conditional metagraph is a metagraph S = {Xp U 
X^,E), in which Xp is a set of propositions and X^ is a set of variables, 
and: 

1. Ve' e £, Eg. U Wg/ # 0; 

2. X = Xy\JXp with XyEXp = 0 such that Vp G Xp, ~ie' G E, if p G Wp, 
then Wg/ = {p}. 

Note that a metagraph as defined in Chapter 2 is a specialization of a con- 
ditional metagraph in which Xp = 0. Where significant, we can refer to such 
a metagraph as a simple metagraph. 
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Thus, conditional metagraphs must meet two constraints in addition to the 
requirement that the generating set he partitioned into variables and proposi- 
tions. First, for each edge, at least one of the vertices must he nonempty, and 
the invertex and outvertex of each edge must he disjoint. Second, if an outver- 
tex contains a proposition, then it cannot contain any other element. 

The values of different propositions can he used to specify alternative con- 
texts for a conditional metagraph. Thus, if a set of propositions P is true, 
another set Q is false, and the remaining propositions (i.e., Xp\(P U Q)) 
are undetermined, then this knowledge can he used to simplify a conditional 
metagraph, so that only those edges that are valid in that context are retained. 
Specifically, the context of a conditional metagraph S with respect to P and Q, 
denoted K(P, Q, S),is also a conditional metagraph in which (1) any proposi- 
tion in P is deleted and (2) any edge containing a proposition in Q is deleted. 
If as a result any edge now has a null invertex or a null outvertex, then that 
edge is deleted as well. The resulting context metagraph will he a conditional 
metagraph because the undetermined propositions will remain as propositions 
in the context. The context operation is a useful abstraction on metagraphs, 
since it avoids the need to consider edges that cannot be used under the stated 
(Q) conditions. 

The following definition of a context is a constructive definition specifying 
the simplification process. 

Definition 5.2. Given a conditional metagraph S = {Xy U Xp, E), a set of 
propositions P c Xp that are known to be true and a set of propositions Q c 
Xp that are known to be false, we define a context (P, 2, 5) as a conditional 
metagraph derived from S as follows: 

1 . For any edge e' & E containing a proposition p & P simplify the edge 
by deleting p\ if the resulting edge has a null in- or out-vertex, delete the 
edge; 

2. For any edge e' e E containing a proposition q e Q in either vertex, 
delete the edge (only the edge and q are deleted, not the other elements 
in the edge’s vertices). 

In transforming a conditional metagraph into a context, the propositions 
whose truth values are known (i.e., P U 2) no longer appear and need not be 
considered in the model selection process. Thus, a context represents a sim- 
plified view of a model base that allows a user to consider only those models 
known to be relevant, and those variables and propositions whose values can 
be manipulated (e.g., in a sensitivity analysis). The larger the sets P and Q 
(i.e., the more specific the context), the simpler is the resulting conditional 
metagraph. 
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(a) 




(b) 




Figure 5.2. (a) A conditional metagraph; (b) the corresponding context metagraph. 



An example of a context metagraph is shown in Figure 5.2. Starting with 
the conditional metagraph S in Figure 5.2(a), then K ({pi}, {pi}, S) is the con- 
ditional metagraph in Figure 5.2(h). 

Given a conditional metagraph, we can ask the following questions: 

1 . What propositions are associated with a given metapath? 

2. Which of these propositions may he assigned values initially, and which 
will depend on the execution of edges in the metapath? 

3. If we know at the start that certain propositions are true and that certain 
other propositions are false, how will that affect our decision analysis 
strategy? 

In order to answer these questions, we define a conditional metapath as 
follows. 

Definition 5.3. Given a conditional metagraph S = {Xy U Xp, E) a source 
B Q Xy and a target C Q Xy, a conditional metapath is a set of edges 
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CM(B, C) = {£', 1 = 1 ,..., L}, forming a metapath from B U Xp to C. The 
set of relevant propositions is 



This may he partitioned into two subsets, the set of input propositions 




and the set of intermediate propositions is 

Thus a eonditional metapath establishes a relationship between two sets 
of variables, using whatever propositions are neeessary to execute the neces- 
sary edges. The relevant propositions are those that appear in the metapath. 
The propositions appear in the invertex of at least one of the edges in the con- 
ditional metapath and it may appear in one of the outvertices as well. If a 
proposition does not appear in any of the outvertices, it is a member of the 
set of input propositions; otherwise, it is a member of the set of intermediate 
propositions. 

The set of input propositions can be evaluated before any of the edges in 
the conditional metapath are executed. The intermediate propositions are eval- 
uated based on some of the outvertex elements of edges in the metapath, once 
the values of those elements are known. The truth or falsity of any, other propo- 
sitions - that is, any propositions not in the relevant set - will have no impact 
on the effectiveness of the conditional metapath in linking the source elements 
to the target elements. 

3.1. Projections in Conditional Metagraphs 

In the previous chapter, we had introduced the notion of a projection oper- 
ator, and showed how it could be used to construct hierarchical views of sim- 
ple metagraphs. We now invoke the projection operator for conditional meta- 
graphs, and then investigate the relationship between projections and contexts 
in the following subsection. 
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Definition 5.4. Given a conditional metagraph S = {Xy U Xp, E), a pro- 
jection of S over the set X' c is a conditional metagraph N{X' , S) = 
{X' U Xp, E') such that e' e E iff there is a dominant metapath from Vp 
to Wg/ in S. 

We note two interesting features of this definition. First, the variables in the 
projection are limited to those in X\ so none of the variables in X\X' appear 
in the projection. Some or all of the assumptions in Xp may appear in the 
projection; the user only specifies X', and the necessary assumptions are de- 
termined by the definition. Second; the projection (and therefore its adjacency 
matrix) represents all relationships among the variables in X' and a-^ = 0 iff 
a'-j = 0 (where A' is the adjacency matrix of the projection). 

The first observation implies that the decision maker needs only to specify 
the relevant variables over which the projection is desired, and the operation 
then generates the relevant assumptions for each projected relationship. The 
second observation implies that use of projections does not require the com- 
putation of the closure of A' (i.e.. A'*), which saves some computational effort. 
The projection itself can be computed using the A* matrix of the underlying 
metagraph. Although the complexity of the procedure is exponential in the 
number of triples in the relevant portion of the A* matrix, the size of this por- 
tion depends upon the projection set (the elements in the generating set that 
define the projection). Since in practice this set is not large (otherwise, the 
benefit of constructing the projection is lost), the procedure is still practical. 

Consider again the conditional metagraph illustrated in Figure 5.3. If 
we project this metagraph over the set of variables X' = {ADV, ECON, NI, 
UCOST, VOL} c Xy, the result is the conditional metagraph illustrated in Fig- 
ure 5.4. Two propositions, cadv and mkt, do not appear in the projection. The 
reason is that cadv appears only in the invertex of C 2 and mkt appears only 
in the invertex of and neither of these edges is a member of a dominant 
metapath corresponding to the edge e" in the projection. 

We now consider the issue of combining projections and contexts. More 
specifically, we address the following questions: 

1. Can a context be defined on a projection (and conversely, can a projec- 
tion be defined on a context)? 

2. If both transformations are to be applied on a base conditional meta- 
graph, does the order matter (i.e., are these operations commutative)? 

The answer to the first question is yes, for the simple reason that the re- 
sult of both operations is a conditional metagraph, which is amenable to all 
the standard operations on conditional metagraphs. The answer to the second 
question is also yes, as evidenced by the following theorem: 
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Figure 5.3. Conditional metagraph. 





Figure 5.4. Projection of Figure 5.3 over [ADV , ECON, NI, UCOST, VOL}. 



Theorem 5.1. N{X', K{P, Q, S)) = K(P, Q, N(X', S)). 

Proof. Let 5i = K(P, Q, N(X', S)), S 2 = N(X', K(P, Q, S)). The proof is 
by contradiction. That is, we show that it is not possible for an edge to be in 
5i and not in S 2 , and vice versa. 

1. Let e e S\, e ^ S 2 - Since e e 5i 3 an edge e' in N(X', S) such that in{e) c 
in{e'), out{e') c out{e) and in{e')\in{e) c P; 

^ 3 a metapath M{in{e'),out{e)) in 5; 
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Figure 5.5. N {{ADV ,ECON ,N1,UC0ST ,VOL}K([inf ,pkv},[cadv,mkt}S)) = K {{inf , pkv} , 
[cadv, mkt}N{{ADV, ECON, NI, UCOST, VOL], S)). 

^ 3 a metapath out{e)) in K{P, Q, S). 

But since e ^ S 2 ^ a. metapath from in{e) to out{e) in K{P, Q, S), which is 
a contradiction. 

2. Now let e e S 2 , e ^ Si- Since e G Si, there is no edge e' in N{X', S) such 
that in{e) c in{e'),out{e') c out(e), and in(e')\in(e) c P; 

^ ^ any metapath from in(e') to out{e) in 5; 

^ ^ any metapath from in(e) to out{e) in K{P, Q, S). 

But, since e ^ S 2 , it follows that 3 a metapath from in(e) to out(e) in 
K{P, Q, S), which is a contradiction, and thus the result follows. □ 

We illustrate this commutativity property using our earlier example, hy 
observing that if we construct the context metagraph for Figure 5.3 with 
P = {inf,pkv}, Q = {cadv, mkt}, and then project this conditional metagraph 
over X' = [ADV,ECON,NI, UCOST , VOL}, we get the conditional meta- 
graph in Figure 5.5, which is also the result of defining the context P = 
{inf,pkv], Q = [cadv, mkt} on the projection of Figure 5.3 over X' . 

3.2. Connectivity and Redundancy 

We now introduce two important properties of metagraphs. The first, con- 
nectivity and especially full connectivity, determines the ability of a metagraph 
to connect certain input variables to certain output variables. The second prop- 
erty is redundancy - that is, a determination of whether there is more than one 
way to connect an input to an output. We begin with some definitions. 

Definition 5.5. Given a conditional metagraph S = {Xp U X^, E), any two 
sets B QX^, and C Q Xy, and R, a defined set of logic expressions over Xp, 
let M(B, C, S) be the set of all edge-dominant metapaths from B to C. An 
interpretation I (Xp, R) is an assignment of truth values to the propositions 
in Xp such that all the expressions in R evaluate to true. P c Xp denotes 




62 



A. Basu and R. W. Blanning 



the set of true propositions in l{Xp, R) and Q c Xp denotes the set of false 
propositions in I (Xp, R). 

To illustrate this, consider again the example metagraph in Figure 5.1. Let 
B = {a, c}, C = {d}, and R he the single logical expression (pi V p 2 ). Then 
M(B,C, S) = {{eue2,e3),{eue4)}. 

We note that a context metagraph K(P, Q, S) corresponding to an inter- 
pretation I(Xp, R) is a simple metagraph (i.e., it has no propositions), since 
P^Q = Xp. 

Definition 5.6. Given a conditional metagraph 5 = (Xp £■), any two 
sets 6 c X„, and C c X^,, and X, a set of logic expressions defined over Xp\ 

1 . Bis said to be connected to C with respect to R if for some interpretation 
I{Xp,R),\M(B,C,K(P,Q,S))\>\- 

2. B is said to he. fully connected to C with respect to R if for every inter- 
pretation /(Xp, R), \M(B, C, K(P, Q, 5))| > 1. 

3. B is said to be non-redundantly connected to C with respect to R if for 
every interpretation I(Xp, R), \M(B, C, K(P, Q, 5))| < 1. 

It follows from (2) and (3) above that B is fully and non-redundantly con- 
nected to C if |M(B, C, K{P, Q, 5))| = 1 for every interpretation I(Xp, R). 

Definition 5.7. A conditional metagraph S = {Xp U Xy,E), is non- 
redundant with respect to /?, a set of logic expressions over Xp, if in the 
context metagraph K(P, Q, S) = {Xp U X„, Ek) corresponding to any inter- 
pretation /(Xp, R) Vx e Xy we have \{e & Ek \ x e We}\ < 1. 

Informally, a metagraph is non-redundant if for every interpretation, each 
element is in the outvertex of at most one edge. In other words, there is at most 
one way to determine the value of the element in each interpretation, so there 
is no ambiguity. 

Given the algebraic representation of a conditional metagraph in terms of 
its adjacency matrix A and closure. A*, the set of metapaths from one set 
of elements B to another set of elements C can be identified using A*. The 
procedure is as follows: 

1. A* is reduced to the rows corresponding to B and the columns corre- 
sponding to C (since all edge-dominant metapaths from B to C can be 
constructed from this sub-matrix); 

2. For each interpretation, the corresponding reduced context metagraph is 
generated; 

3. All valid metapaths from B to C in the context are then constructed for 
that interpretation using the procedure specified in Chapter 3. 
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The computational complexity of the above procedure depends upon the 
size of the proposition set Xp. In general, the number of possible interpre- 
tations of a given set of propositions and set R of expressions is exponen- 
tial in N, the number of propositions (the worst case complexity is 1 ^). 
However, in many practical situations, the number of possible interpretations 
will be sufficiently small to render the procedure feasible. For instance, in 
the context of process modeling, the number of interpretations of R cor- 
responds to the number of alternate workflows for the represented process, 
which is not likely to be very large for typical business processes. Also, 
R may have some special structure that can be exploited to restrict the 
search for interpretations. For example, in process modeling, R may contain 
a number of expressions of mutual exclusivity between complementary lit- 
erals (e.g., Xp = {salaried, hourly, high-risk, low-risk} and R = {(salaried© 
hourly), (high-risk© low-risk)}, where © means exclusive disjunction). 

Given a metagraph S with pure inputs PI and pure outputs PO, we can test 
whether PI is fully connected to PO in S as follows: 

1 . Let 5 be a simple metagraph. Then S is fully connected if there exists 
any M{P1, PO) -, 

2 . Let 5 be a conditional metagraph, with a set of propositions Xp used 
as assumptions. Assume that all Xp are pure inputs. Then S is fully 
connected ijf there exists a M(PI, PO) for every possible interpretation 
of Xp. In effect, this implies that S is fully connected iff there exists a 
M(PI, PO) even when all Xp elements are false. 

In addition to case 2 , let be a set of Horn clause assertions on Xp. Again, 
S is fully connected iff there exists a M(PI, PO) even when all Xp elements 
are false. The only additional feature here is that even when the truth value of 
all propositions in Xp are not explicitly known, the unknown propositions that 
are heads of clauses in R can be inferred. 

Consider the union S^, of two fully connected metagraphs 5 i and S2 (i.e., 
53 = 5 i U S2). Since PO\ can be reached from PIi in all interpretations, and 
PO2 can be reached from P/2, then PO\ U PO2 is reachable from Pl\ U P/2 in 
all interpretations. Then it follows that the pure inputs P/3 c P/j u P/2, and 
the pure outputs PO3 c PO\ U PO2. 

However it is important to realize that the pure inputs and pure outputs 
of the combined metagraph need not be P/i U P/2 and PO\ U PO2 respec- 
tively. Therefore, it does not necessarily follow that Sj, is fully connected, 
as is demonstrated by the following example. Let Si consist of the edge 
{{a, b), (c, d)} and S2 consist of the edge {(</, /), (b, g)}. If S3 = Si U S2, then 
P/3 = {a, /} and PO3 = [c, g}, and there is no metapath from P/3 to PO3. 
Thus, S3 is not fully connected. Although neither Si nor S2 is cyclic, S3 is 
cyclic. 




64 



A. Basu and R. W. Blanning 



Theorem 5 . 2 . Given two fully connected metagraphs 5 i and S2, their union 
S3 = SiU S2 is also fully connected if it is acyclic. 

Proof. Since S3 is acyclic, its elements can be organized in a partial order 
based on tbe existence of simple paths between elements (i.e., p precedes q 
if there is a simple path from p to ^). It follows that the elements in PI3 are 
roots and elements of PO3 are leaves of the precedence graph. 

Consider an element x in PO3 that is not reachable from PI3 in some in- 
terpretation. Without loss of generality, assume that x is in PO\. Since x 
is reachable from some subset of Pl\, say Plu, then it must be true that 
PI\x\Pl3 0 . Let PI\x\Pl3 = Y . Then Y consists of elements that were in 
Pl\, but are not in PI3. Thus, each element of Y must be either an internal 
element of S2 or a pure output of S2. The precedence graph to each such el- 
ement has to ultimately end with roots that are in PI2, say PHx- If all these 
elements are in P/3, then x is reachable from PI3 and we are done. However, 
if Pllx\Ph = Y, 0, then as before, these must be internal or pure output ele- 
ments in 5 i . Since S3 is acyclic, Z is reachable from a subset of Pl\ say Pl\x2 
such that Pl\x2 n Phx = 0 - Since both Pl\ and P/2 are finite, these iterations 
must terminate, which proves the result. □ 

This theorem provides a two-step test for whether the combination of two 
fully connected metagraphs is also fully connected, as follows: 

1 . Compute A* and examine its diagonal elements. If all of the diagonal 
cells are empty, then full connectivity still holds because the metagraph 
is acyclic; 

2 . Test whether there is any cycle containing elements either from Pl\ and 
PO2 or P/2 and PO\. If not, then full connectivity still holds. 

If neither condition above holds, then full connectivity cannot be assured. In 
such a case, the resulting metagraph has to be itself tested for full connectivity 
by looking for a metapath from its pure inputs to its pure outputs in each 
interpretation. 

It is possible to determine whether a given metagraph is non-redundant in a 
given context, using the algebraic representation. The incidence matrix G can 
be adapted to the context by reducing it based on the elements in P (the true 
propositions) and Q (the false propositions). Then, the resulting metagraph is 
non-redundant if each row has at most one ‘-|- 1 ’ entry. Otherwise, there are two 
or more edges (tasks) that produce the same output. As before, the complexity 
of checking whether the metagraph is non-redundant in all contexts depends 
upon the number of valid contexts, which in turn is determined by R (the 
remaining undetermined propositions). 




Chapter 6 

INDEPENDENT SUB -METAGRAPHS 



We now examine the issue of independence of a sub-metagraph contained 
within a larger metagraph. This is a useful notion, since it helps identify com- 
ponents of a larger and complex system that can be abstracted a a higher level, 
and possibly removed as a separate subsystem. 

Definition 6.1. A metagraph S' = {X\ E') is said to be a sub-metagraph 
(SMG) of another metagraph S = {X, E) (denoted by S' c 5) if X' c X and 
E' c E. 

Note that the SMG relationship is defined by edges, not by elements. Thus, 
it is possible for S' C S even if X' = X as long as E' c E. 

Input independence: A metagraph Si is an input independent SMG of a 
metagraph S 2 if every element of Si that is not a pure input is determined only 
by edges within Si. 

Output independence: A metagraph Si is an output independent SMG of 
a metagraph S 2 if every element of Si that is not a pure output is used (i.e. as 
an input) only by edges within Si. 

Independence: A metagraph Si is an independent SMG (denoted ISMG) of 
a metagraph S 2 if it is both input independent and output independent. 

To see examples of input independence, output independence and indepen- 
dence consider Figure 6.1. A number of possible sub-metagraphs can be iden- 
tified in this metagraph, and Table 6. 1 lists a number of these, along with their 
pure inputs and outputs, and whether they are input independent, output inde- 
pendent and/or independent. From this example, it should be apparent that the 
independence properties of a particular SMG are not always readily apparent 
from visual inspection. 



Table 6.1. Independence of some SMGs in Figure 6.1. 



SMG 


PI 


PO 


Other 


Input Ind. 


Output Ind. 


Independent 


(ei, £ 2 ) 


a, h 


d, k, 1 


c 


NO 


YES 


NO 


{e\,e2,e^} 


a, h 


k,t 


c, d 


YES 


NO 


NO 


{ei, £ 3 - 65} 


c, d, h, q 


1, n, p 


k 


YES 


YES 


YES 


{e\,C2, 65} 


a, h, q 


1, n, p 


c, d, k 


NO 


NO 


NO 
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Figure 6.1. Illustration of independent components in a metagraph. 



It is also useful to consider the relationships between different ISMGs of 
a given metagraph. The following theorems (from Basu and Blanning, 2003 ) 
identify some special cases of interest: 

Theorem 6.1. Given two ISMGs and S2 of a common containing meta- 
graph S, then 53 = 5 i U 52 is an ISMG of S. 

Proof. We prove this hy contradiction. All elements that are not either pure 
inputs or pure outputs of 5 i or S2 clearly cannot violate the independence of 
S3. Let X he a pure input of 5 i that violates the independence of 53 hy being 
in the output of some edge outside S3. Clearly x must be a pure output of S2. 
But by definition, every pure output of S2 is determined only by edges within 
S2, and thus S3, which contradicts the claim about x. A similar argument holds 
if X is a pure output of 5 i that is an input to some edge outside S3. Thus the 
result is proved. □ 

Independence is desirable in the sense that any coordination issues involv- 
ing an ISMG can be addressed solely in terms of its pure inputs and pure 
outputs, while this is not true for all SMGs in general. 

Definition 6 . 2 . Two metagraphs are mutually independent if each is an 
ISMG in the metagraph formed by their union. 

Theorem 6.2. If two edge-disjoint SMGs (no common edges) are both 
ISMGs of a single containing metagraph, then they are mutually independent 
of each other. 

Proof. Since the two SMGs are edge-disjoint and each is an ISMG, the only 
common elements can be “boundary elements” (i.e. pure inputs or pure out- 
puts), and furthermore, the SMGs must be sequentially related, that is, PI\ can 
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overlap with PO2 but not PI 2, and PO\ can overlap with PI2 but not PO2 (the 
reason for this is that if Pl\ and PI2 have any common elements, then because 
of the independence assumption, all edges containing the common elements 
have to be in both SMGs, which violates their edge-disjointedness). It then 
follows that they are each ISMGs of their union, which proves the result. □ 

Theorem 6.3. Given two ISMGs S\ and S2 of a containing metagraph S, 
the intersection 5 ^ = SiCi S2 is also an ISMG of S. 



Proof. Each ISMG can be viewed as the union of a set of independent meta- 
paths (each itself an ISMG of S and Si), each from some subset of pure inputs 
to a subset of pure outputs of that ISMG. By the definition of independence, 
if any edge e occurs in both Si and S2, every metapath M(PI\, PI2) in Si 
containing e must also occur in S2. Otherwise, at least one edge in M would 
violate the independence of Si and S2. Thus Si n S2 can also be viewed as 
the union of a set of independent metapaths from its pure inputs to its pure 
outputs, and is thus an ISMG of S. □ 

Using the matrix representation of metagraphs, we have developed an algo- 
rithmic procedure to test for independence of a given SMG S' in a metagraph 
S. The procedure is quite straightforward, since the columns in A correspond- 
ing to all elements in S' other than the pure inputs have to be empty except 
for rows corresponding to elements in S'; similarly all rows corresponding to 
elements in S' other than pure outputs have to be empty except for columns 
corresponding to elements in S'. 

The following algorithm determines whether a given sub-metagraph 
S'{X' , E') is independent within a given metagraph S{X, E) with adjacency 
matrix A: 

Procedure Check- Independence (S', A) 

Let PP (PO') be the set of pure inputs (pure outputs) of S' {generated using 
Procedure PIPO below}. 

Fori = l,...,|X| 

For; = l,...,|X| 

If [(x; G {A''\P/'} and xj ^ X') or {xj G {X'\PO'} and x; ^ X')] and 
aij 0 then S' is not independent in S ; STOP. 

Next j 

Next i 

S' is independent in S ; 

STOP. 
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The following algorithm identifies the sets PI' and PO' used in procedure 

Check- Independence: 

Procedure PIPO {S' , A) 



Let PI' = PO' = 0-, 

For each x, G X', 

If Qij = 0 'ixj G X' then PO' = PO' U x; ; 

If Qji = 0 Vxj G X' then PI' = PI' U x,- ; 

Repeat; 

Return PI', PO'] 

STOP. 

Note that these are polynomial procedures that are guaranteed to terminate. 




PART II 



APPLICATIONS OF 
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Chapter 7 

METAGRAPHS IN MODEL MANAGEMENT 



This is the first of three chapters in which we will examine three applications 
of metagraphs to information processing systems. The first is the application 
to the management of decision models, which is examined in this chapter. 
The second is the management of data bases and rule bases, which will be 
examined in Chapter 8. The third is the management of workflow systems, in 
which the work consists of information processing tasks to be performed by 
humans or machines. We will examine this application in Chapter 9. 

There are four important topics in the application of metagraphs to model 
management. The first is the representation of decision models as metagraphs, 
in which the input-to-output mapping of a model is represented by the set-to- 
set mapping in a metagraph edge. Thus, a model base is a set of metagraph 
edges (i.e., a metagraph) which collectively represents the model base. We are 
not interested here in the content of the models; rather we view models as 
“black boxes” and consider only relationships among models as determined 
by common invertex and outvertex elements; for example, an output of one 
model may be an input to another model. 

The second topic is model selection and integration. In both selection and 
integration the principal concept is that of a metapath from a set of known, 
or source, elements (i.e., elements whose values are assumed to be known) 
and a set of desired, or target elements (i.e., elements whose values are to be 
calculated). We are not concerned here with calculation procedures, but only 
with the existence and uniqueness of metapaths from source to target. In the 
case of multiple metapaths, we wish to identify any bridges - that is, edges 
that must be present in the metagraph regardless of which metapath is used. 
A distinction of interest here is the one between cyclic and acyclic metagraphs, 
which represent two quite different structures of model bases. 

The third topic is hierarchical modeling, which concerns the integration of 
model bases with possibly overlapping variables - that is, the integration of 
metagraphs with possibly overlapping generating sets. Each model base may 
be used in a simplified analysis in which if is projected over a subsef of ifs 
generating sef. If fhe informalion in fhese projections is fo be infegrafed, fhe 
question is whefher fhe union of fhe fwo projecfions is fhe same as fhe projec- 
fion of fhe union of fhe fwo mefagraphs over fhe union of fhe fwo generafing 
sefs. In ofher words, can we combine (by faking fhe union) fhe fwo projecfions 
or musf we firsf combine fhe fwo mefagraphs and fhen do fhe projection. We 
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will see that the union of the projections is not the same as the projection of 
the union. However, the projection of the union dominates the union of the 
projections. 

The fourth topic is the use of assumptions in model bases. An assumption 
associated with a model is a proposition (i.e., a variable that can take on either 
of two values - true or false) that must be true for the model to be used in 
a calculation procedure (i.e., for an edge to be used in a metapath). Thus, a 
model base is viewed as a conditional metagraph in which the generating set is 
enlarged to include the assumptions as elements and an assumption associated 
with a particular model is part of the invertex of the edge representing the 
model. We consider the projection and context of a conditional metagraph and 
ask if they are commutative: given a conditional metagraph, a subset of the 
generating set, and a set of assumptions (with true, false, or unknown values), 
is the projection of the context the same as the context of the projection. We 
will see that they are the same. 

1. MODELS AS METAGRAPHS 

In the metagraph view of models each model is represented as an edge with 
the inputs as invertex and the outputs as outvertex. The principal issue here 
is connectivity - that is, the existence or lack of existence of one or more 
metapaths connecting a source set of elements to a target set of elements. In the 
case where there is more than one metapath, we wish to know whether there 
are any bridges - that is, any edges in the intersection of all such metapaths. If 
there are any such edges, then they must exist for the source to be connected 
to the target. In other words, the corresponding models must exist if the source 
elements can be used to calculate the target elements. 

Consider the simple example of a model base illustrated in Figure 7.1. There 
are four elements: inflation rate (INFL), which is a pure input, revenues real- 
ized (REV) and expense incurred (EXP), both of which depend on INEL and 
only on INEL, and the resulting net income (NI), which depends on both REV 
and EXP. Thus, the generating set is A = {INEL, REV, EXP, NI}. There are 
three models, which are represented by three edges. The first is a sales model, 
sis, which calculates its outvertex {REV} from its invertex {INEL}. The second 
is a cost model, cost, which calculates its outvertex {EXP} from its invertex 
{INEL}, and the third is a financial model, which calculates its outvertex 
{AT} from its invertex {REV , EXP}. 

We can see from the various components of the adjacency matrix A (Fig- 
ure 7.2) that sis and cost have no coinputs or cooutputs, but that is not true 
of rev. For example, ojnfl.rev discloses that infl needs only INEL as input 
(i.e., there are no coinputs) and produces only INEL as its output (there are 
no cooutputs). Similarly, ujnfl.exp discloses that cost needs only INEL as its 
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INFL 


REV 


EXP 


NI 


INFL 


0 


{<0, 0, <sls»} 


{<0, 0, <cost»} 


0 


REV 


0 


0 


0 


{<EXP, 0, <fm»} 


EXP 


0 


0 


0 


{<REV, 0, <fm»} 


NI 


0 


0 


0 


0 



Figure 7.2. Adjacency matrix for Figure 7.1. 



input and produces only EXP as its output. On the other hand, the N1 column 
of A discloses that^n has both EXP and PEV as inputs hut only NI as output. 
That is, aREv,Ni has EXP as coinput and qexpm has REV as coinput, hut in 
both cases NI has no cooutput. 

The closure of the adjacency matrix A*, is illustrated in Figure 7.3, shows 
that there are two simple paths of length exceeding 1 , both connecting INEL 
to NI. The first is the simple path {sis, fin), which has EXP as coinput and 
REV as cooutput. The second is the simple path {cost, fin), which has REV as 
coinput and EXP as cooutput. Thus, there is no sequence of models (i.e., no 
simple path) connecting INEL to NI that is free of coinputs. However, this does 
not mean that NI cannot be calculated from INEL alone, for there is a meta- 
path {sis, cost, fin] connecting INEL to NI. Thus, we can see the advantage of 
representing model bases as metagraphs and of defining metapaths: metapaths 
disclose a type of connectivity that simple paths do not. 

Finally, we can see that the model bank is acyclic, since A* contains only 
null diagonal elements. Thus, there is no sequence of edges (simple path) from 
any element to itself, and it is possible to execute the models in a sequential 
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EXP 


NI 


INFL 
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{<0, 0, <sls»} 


{<0, 0, <cost»} 


{<{EXP}, {REV}, <sls, fin», 
<{REV}, {EXP}, <cost, fm»} 


REV 
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0 


{<EXP, 0, <fin»} 


EXP 


0 


0 
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{<REV, 0, <fm»} 


NI 


0 


0 


0 


0 



Figure 7.3. Closure A* of the adjacency matrix for Figure 7.1. 

fashion without the need for iterative execution or other simultaneous imple- 
mentation of the models. This is accomplished hy executing rev and exp (in 
either order) and then fin. Of course, if there were an edge connecting N1 to 
INFL we would have a cyclic model hank and it would he necessary of re- 
solve the simultaneity. But it is unlikely that such a model would exist, since 
the inflation rate in the economy would presumably he exogenous to a partic- 
ular firm. Thus, the model hank and its metagraph are acyclic, and the models 
(edges) form a partially ordered set. 

2. MODEL SELECTION AND INTEGRATION 

This is not the case with the model base illustrated in Figure 7.4, which 
describes the supply and demand relationships in a firm and its market. There 
are four elements: the gross national product or other measure of overall eco- 
nomic activity, GNP, the price charged by the firm, PRl, the resulting sales 
volume, VOL, and the firm’s capacity to produce, CAP. There are two models, 
a demand model, dmd, with invertex {GNP, PRl} and outvertex [VOL] and a 
supply model, sup, with invertex {VOL} and outvertex {PRl, CAP}. This is a 
cyclic metagraph describing a cyclic model base. 

We will not describe the entire adjacency matrix, but we note that it contains 
two non-null diagonal elements: 

a*pRi pRj = {{G77P}, {CAP, VOL}, {dmd, sup)], 

avoL.voL = {{GAP}, {CAP, PRl}, {sup, dmd)]. 

Thus, there is a simple path, {dmd, sup), from PRl to itself and another 
simple path, {sup, dmd), from VOL to itself. Since these two simple paths form 
a cyclic permutation of each other, there is a single cycle involving both PRl 
and VOL. We note that g^P and aj^p ^-^p are both null, since GNP and 
CAP do not participate in any cycles. 

In this case the models do not form a partially ordered set, because we need 
to know the value of PRl to calculate VOL and we need to know the value of 





Metagraphs in Model Management 



75 




Figure 7.4. Cyclic metagraph. 



VOL to calculate PRI. If we were able to look inside the black boxes and we 
were to find that the models had a simple functional form, such as linearity, 
we could determine equilibrium values for PRI and VOL as a function of GNP 
(which would presumably be a parameter in the {{GNP, PRI}, {VOL}) rela- 
tionship) in closed form. But since we assume that the models are atomic (i.e., 
they are black boxes), that is not possible and an iterative approach is needed. 

The iterative approach would begin by receiving an exogenous input GNP. 
Then there are two possible sequences of activities: 

• We would posit an initial value for PRI, enter these values for GNP and 
PRI into the model dmd to calculate VOL, and then use sup to calculate a 
second value for PRI. The initial and calculated values for PRI would be 
compared. If they were sufficiently close to each other (within a predeter- 
mined threshold), the process would terminate. Otherwise, a new initial 
value for PRI would be calculated, possible by splitting the difference be- 
tween the posited and calculated values of PRI, and the calculation cycle 
would begin anew. This would continue until the process converged. 

• Alternatively, we could begin with an exogenous value of GNP, as before, 
but posit an initial value of VOL. Then sup would be used to calculate a 
value for PRI, which would be entered into dmd to calculate a second 
value for VOL. If the initial and calculated values were sufficiently close 
to each other, the process would terminate, otherwise the cycle would be 
repeated. 

We note that in each of these cases a value for CAP would be calculated 
during each cycle, but only the final value would be of interest. 

In describing the iterative approach we have relied on the notion that equi- 
librium values for PRI and VOL exist and that this equilibrium is unique (i.e., 
that there are not multiple values of PRI, VOL pairs that are within the pre- 
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Figure 7.5. Metagraph with bridges. 



determined thresholds). If the first notion is false, we would have to terminate 
the search once the lack of an equilihrium is apparent. In the second case we 
would have to identify the various equilibria and select an appropriate one. 

We now turn to a second topic, multiple simple paths connecting two el- 
ements. This may lead to multiple metapaths between the same source and 
target. Consider the example of Figure 7.5, in which e\, 62 , and ej, are bridges 
between their respective elements. For example, it would be impossible to cal- 
culate Rev from Pri and Vol without e\. However, ^4 is not a bridge between 
{Rev, Exp} and {Notes} because either ej or e\ could be used to perform the 
calculation. This can be seen from qexp, notes, which consists of two compo- 
nents {{Rev}, {Prof}, {ej,)) and ( 0 , 0 , {ef))- 

3. HIERARCHICAL MODELING 

To illustrate hierarchical modeling, we use an example from life cycle cost- 
ing. The example, in metagraph form, appears in Figure 7.6, and the legend 
defining the elements (variables) in the generating set are defined in Figure 7.7. 
For example, model e\ will calculate the manufacturing cost per vehicle (MC) 
and the expected service life (SL) from the design variables (DV), such as the 
engine volume and the assembly time. 

We now consider the projection of the metagraph in Figure 7.6 over a subset 
of its generating set. Let X' = {DV, MD, PR, LO, TL}. The projection, which 
appears in Figure 7.8, consists of three edges, each of which is a dominant 
metapath in S, and the vertices of which are contained in X' . No elements in 
X\X' = {MC, SL, AO} appear in the projection. 
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Figure 7.6. Metagraph for life cycle costing. 
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Annual fuel consumption 


AO 


Annual operating & support cost 


AS 


Annual fuel cost 


DV 
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Fuel cost 
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Frequency of maintenance 
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Life cycle ops & support cost 


MC 


Manufacturing cost 


MD 


Annual miles driven 


PM 


Annual cost of meintenance 


PR 


Price 


SL 


Expected service life 


SN 


Sensitivity of op. eost to MD 


TL 


Total life cycle cost per vehicle 



Figure 7.7. Variables used in Figure 7.6. 




Figure 7.8. Projection of life cycle costing metagraph. 







78 



A. Basu and R. W. Blanning 



The projection provides a high level view of the metagraph that hides certain 
details. The projection will he incomplete, and deliherately so - that is, if an 
edge e' = {V' , W') appears in S' , then it is possible to calculate W' from V' 
in S, hut there may he several other intermediate variables in X\X' that are 
also calculated. For example, in Figure 7.8 we can see that it is possible to 
calculate the price {PR) given only the design variables {DV), and the fact that 
manufacturing cost (MC) is an intermediate variable is hidden from the person 
viewing the projection, since MC G X\X' . In addition, the fact that service life 
(5L) is also calculated in the process (by e\) is hidden from the user, because 
SL G 

Each of the edges in the projection may be of interest to one or more man- 
agers or analysts in an organization. For example, e'^ may be of interest to a 
marketing manager who wishes to know how the design of a vehicle will af- 
fect its price, may be of interest to the manager of a car rental company 
who wants to evaluate different designs under various mileage conditions, and 
gg may be of interest to senior managers who wish to know the profit con- 
sequences of various pricing strategies. Thus, a projection may have several 
“customers” who see different benefits to the different edges. Of course, this 
can also be accomplished by making several projections of a single metagraph. 

The advantage of a projection is that it may disclose relationships that 
are implicit in the original metagraph but are not easy to see because of the 
size and complexity of the original metagraph. The relationship, represented 
by gj between DV and FR is one example. Another example is provided by 
e' 2 , which represents the invocation of a metapath M{{DV , MD}, {LO, TL}) = 
{gi , g 2 , g 3 , g 4 , gj}. It may not be immediately clear that this is a dominant 
metapath for the calculation of both LO and TL. The third edge in S' , gj is 
easily discernible from S, since it is simply e$. 

Thus, in Figures 7.6 and 7.8, C(gj) = {{gi,g 2 }}, CCg^) = {{gi, g 2 , ^ 3 , g 4 , 
gs}} and CCgg) = {{gs}}. We note that a composition is not a set of edges, 
but a set of sets of edges, because there may be more than one metapath in S 
corresponding to an edge in S' . The composition of edges in this and all other 
projections in this chapter are shown in Figure 7.9. In the simple examples 
presented here, there are no alternate metapaths between any source and target; 
therefore, each composition in Figure 7.7 contains a single set of edges. 

Thus, a set of models such as the life cycle costing models can be trans- 
formed into a higher-level view in which certain variables and relationships 
are retained and others are deliberately hidden from the user. In the example 
given above, several managers wished to know about relationships between the 
design variables, miles driven, sales price, annual operating costs, and total life 
cycle costs. On the other hand, they were not interested in the manufacturing 
costs, service life, or the sum of annual operating and service costs. Where 
these latter variables appeared as inputs to a model, the model was discarded 
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C(e,6 ) (Fig 7 . 16 ) 


{{63, 64, 67}} 



Figure 7 . 9 . Compositions of projected edges in various projections. 



in the higher-level view. Where they only appeared as outputs, the models were 
retained, hut the outputs were restated to exclude these variables. Where the 
discarded variables were intermediate outputs in relationships between vari- 
ables of interest to the managers, the relationships appeared in the higher-level 
views, without the intermediate variables. For example, a marketing manager 
could see that the design variables were sufficient to determine the sale price 
of the car, without having to consider, or be even aware of the intervening 
variable manufacturing cost. 

In addition, several views may be constructed from a single model base. 
In other words, different managers accessing the same model base may have 
different views. For example, a marketing manager negotiating with a major 
customer may be interested only in the design variables, the annual operating 
and service costs, and the total life cycle cost. If we construct a higher-level 
view of the life cycle models with only these variables, we find a single rela- 
tionship: that the design variables and the annual operating and service costs 
are sufficient to compute total life cycle costs. All the other variables are de- 
liberately hidden from this view. 

We have examined the use of metagraphs in constructing views of a single 
model base. We now turn to the issue of combining model bases. Consider a 
situation where two sets of users have distinct model bases over possibly over- 
lapping generating sets. Also, consider two users, one in each group, that visu- 
alize their respective model bases in terms of specific projections. For instance, 
a marketing manager may use a view that relates demand and manufacturing 
cost to price and volume, while a production manager may use a view that 
contains a relationship between batch size and raw, material cost to unit man- 
ufacturing cost and mean time between failures. The two managers may want 
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to combine their resources to solve aggregate problems, such as determining 
the effect of change of batch size on sales volume. The underlying analytical 
question that this raises is whether it is sufficient to simply combine the views 
that the two managers have of their respective model bases, or whether it is 
necessary to combine the model bases and then project them. 

The conceptual problem underlying this issue is the combination of two 
metagraphs to produce a new metagraph. If the metagraphs are S\ = (Xi, Ei) 
and S 2 = {X 2 , E 2 ), then the new metagraph, which we will call the sum of 
5i and 52, will be 5i2 = 5i + 52 = {Xi U X 2 , £1 U E 2 ). If Xi n X 2 / 0, 
then 5 i 2 may contain simple paths and metapaths that are not in either 5i or 
82 - To see this, consider two projections 5^ and 5^ of 5i and S 2 respectively, 
where 5j = {X'-^, E'^) is the projection of 5i over X'^ ^ ^1 and 5^ = £ 2 ) 

is the projection of S 2 over ^ X 2 . If we combine the two views, we get a 
metagraph S[ + 5^ = {X'^ U £j U E'-^). We would like to know whether any 
information about relationships between elements of X'^ U X 2 are lost in this 
process. That is, we would like to know whether 5j + 5^ contains the same 
information as the metagraph 5^2 = {X\ U X^, E'^ 2 )’ which is the projection of 
5 i 2 over Xj U X^. 

We note that 5^2 dominates the sum 5^ + 5^, but the converse need not 
hold. That is, there may be some edges in 5i2 that are not dominated by any 
edges in 5j + 5^. To illustrate this, we expand the life cycle costing example 
described in the previous section. Consider a set of four cost estimating rela- 
tionships (variables are defined in Figure 7.7). The metagraph is illustrated in 
Figure 7.10. The four edges e(,, ... ,e<) are cost estimating relationships that 
would be constructed by engineers and used by the sales force. Figure 7.11 
contains the projection of this metagraph over {EC , PM ,MD , SL, AO} . The 
edge e\ allows us to calculate AO from EC, PM and MD without consider- 
ation of the intervening variable, A5, and C{e'^) = {{es, eg}}. The calculation 
represented by e'^ is simply that represented by e?, so C(ej) = {{e?}}. 

We will denote the metagraph in Figure 7.6 as 5i, Figure 7.8 as 5j, Fig- 
ure 7.10 as 52, and Figure 7.11 as 5^. Also Xj = [DV, MD, PR, LO, TL] and 
X^ = [EC,PM,MD,SL,AOy, thus, Xj U X^ = [DV,MD,PR,LO,TL,EC, 
PM, SL,AO}. The joint metagraph - that is, the sum of 5i and S 2 - appears 
in Figure 7.12, and its projection over Xj U X^, giving 5j2, appears in Fig- 
ure 7.13. 

The sum of the projections, 5j -|- 5j, appears in Figure 7.14. We can see 
from Figures 7.13 and 7.14 that 5j2 dominates 5j -|- 5j. For example, some 
edges, such as {{PR,LO}, {TL}), appear in both 5j2 and 5j -|- 5j. It appears 
as ejp in 5j2 and ej in 5j -|- 5j. In this case the compositions of the relevant 
edges are the same: C(ej) = C(ej) = {{es}}. On the other hand, the edge = 
{{MD, PC, PM}, {AO}) in 5j -|- 5j does not appear in 5j2, but it is dominated by 
the edge e'^ = {{MD}, {AO, SL}) in 5j2. In addition, there are edges in 5j2 that 
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Figure 7.10. Metagraph of cost estimating relationships. 




Figure 7.11. Projection of cost estimating metagraph. 



do not dominate any edge in An example is e\ = {{DV}, {PR, SL}). 

Although the edge e\ = {{DV}, {PR}) is found in 5^ + S' 2 , there is no edge in 

+ 52 with DV in its invertex and SL in its outvertex. 

A possible negative consequence of integrating high-level views of a model 
base is that information may be lost. In the metagraph 5j + 5^, we find that 
mileage driven, along with fuel cost and the annual cost of preventive mainte- 
nance, is sufficient to determine annual operation and support cost. However, 
we do not find that mileage driven by itself is sufficient to determine annual 
operation and maintenance cost, because the latter variable is not in the set 
of elements over which the life cycle costing metagraph was projected. In ad- 
dition, the fact that the design variables are sufficient to determine expected 
service life is missing from the sum of the projections, even though both vari- 
ables are found in the sum of the projection, because expected service life was 
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Figure 7.12. The joint metagraph. 




Figure 7.13. Projection of the joint metagraph. 



not in the set of elements over which the life cycle costing metagraph was 
projected. This may give the misleading impression that service life can he 
calculated only from miles driven, even though it can also he calculated from 
the design variables. 

The condition X\ D X2 = X'^ n did not hold in the previous example, 
because X\ 0X2 = {MD, SL,AO} but X'^ n X2 = {MD}. Thus there were el- 
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Figure 7.14. The sum of the projections (Figures 7.8 and 7.1 1). 




Figure 7.15. Second projection of Figure 7.6. 

ements (SL and AO) common to the generating sets of metagraphs that were 
not common to the sets of elements over which the two metagraphs were pro- 
jected. 

We now construct an example in which the condition is satisfied. We 
will retain the projection of S 2 over {FC,PM,MD,SL,AO} (in Figure 7.11) 
hut project 5i over a new X'^ = {DV ,SL,MD,AO,LO} (in Figure 7.15). In 
this case a new X'-^ = X\ Ci X 2 = X'^ 0 X 2 = {MD,SL,AO} - that is, all 
of the elements that are found in the generating sets of both 5i and S 2 
are also in the sets of elements over which 5i and S 2 are projected. The 
union of the sets over which the metagraphs are projected is X\ 0 X 2 = 
[DV ,SL,MD,AO,LO,FC,P}, and the projection of 5i2 over this set, which 
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Figure 7.16. Second projection of the joint metagraph. 




Figure?. 17. The sum of Figures 7.11 and?. 15. 

is a new 5j2, is shown in Figure 7.16. The sum of the projections, + 5^, 
appears in Figure 7.17. 

We can see as before that 5i2 dominates 5^ + For example the edge 
{{DV}, {^L}) is present in both 5i2 and + 5^, whereas in the previous ex- 
ample it was not present in The edge = {{MD, FC, PM}, {AO}) is 

in We can also see that the converse in true; + S '2 dominates 5j2- For 
example, the edge in 5 j 2 is dominated by the metapath [e'^, 6 ^ 2 ^ e\^} in 
5j + 5'. 

We can also see that equivalence (i.e., mutual dominance) does not nec- 
essarily imply equality. The metagraphs in Figures 7.16 and 7.17 are not the 
same, in part because of the existence of edge ^4 = {{MD,FC,PM}, {AO}) 
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in + 52. However, this edge does not destroy the equivalence of 5^2 
and 5 j + 5 ^, since it is dominated hy in 5 j + 52- In addition, the edge 
e' = {{DV, MD}, {LO}) in 5 j + 5 ^ does not appear in 5^2, hut it is dominated 
hy the edge 5^2, which in turn does not appear in 5 ^ + 5 ^ hut is dominated hy 
the metapath {e'^, e'^2^ ^13} + ^2- 

Although 5 j 2 and 5 j + 5 ^ are equivalent, there is a difference between 
them: 5^2 seems simpler than 5 j + S!^- One reason for this is that a projec- 
tion such as 5 j 2 only contains dominant metapaths, whereas the sum of two 
projections such as 5 j + 5 ^, need not. For example, in Figure 7 . 17 , the edge 
e'^ = {{MD, FC, PM}, {AO}) is in S[ + 5 ^, hut not in 5^2, because it is domi- 
nated by ej2- Another reason for the simplicity of 5^2 is the requirement that 
no two edges can have the same invertex. Thus, edge e'jg in 5^2 corresponds to 
three edges - gg, e^2, and - in 5 j -|- 5 ^. As before, there is no such simpli- 
fying requirement for the sum of two projections, only for a single projection. 

We have seen that there is a simple criterion for the integration of two views 
that avoids the misleading impression that a calculation cannot be performed 
(e.g., that expected service life cannot be calculated from the design variables) 
when in fact the calculation can be performed. The requirement is that all 
variables common to the two sets of calculations (i.e., the life cycle costing 
calculations and the cost estimating relationships) also be in both of the sets 
of variables used to construct the higher-level views. We have also seen that 
two views can be equivalent without being identical. In our second example, 
the sum of the higher-level views of the life cycle costing calculations and of 
the cost estimating relationships was equivalent to a single view constructed 
directly from both sets of calculations. 

At the same time, the ways in which these relationships were presented to 
the user were different in two respects. First, the sum of the high-level views 
contained redundant information, in the form of dominated metapaths. For ex- 
ample, the sum of the views disclosed that miles driven, fuel cost, and annual 
cost of preventive maintenance are sufficient to calculate annual operation and 
support cost, but it also disclosed that miles driven is sufficient to calculate an- 
nual operation and support cost. The second difference is that several relation- 
ships in the sum of the higher-level views may appear as only one relationship 
in the direct view of both sets of calculations. For example, the sum of views 
contained separate relationships between miles driven and service life, miles 
driven and annual operations and service cost, and miles driven and life cycle 
operation and support cost. In the direct view this is presented to the users as 
a single relationship: miles driven is sufficient to calculate all three variables. 
Thus, the users can easily see that several of the variables of interest to them 
are determined by a single variable, miles driven. 
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4. ASSUMPTIONS IN MODEL BASES 

We now turn to the use of assumptions in model bases. We define an as- 
sumption as a proposition (i.e., a statement that may be true or false) associ- 
ated with a model that must be true if the model is to be used in a particular 
instance. In metagraph terms, an assumption associated with an edge must be 
true for the edge to appear in a metapath. 

In order to explore this we must use a conditional metagraph, in which the 
generating set is partitioned into two subsets. The first is a set of variables, 
denoted X^, and the second is a set of propositional statements, denoted Xp. 
Each X e Xv represents a variable, such as revenue, production level or infla- 
tion rate. Each p e Xp represents a proposition, such as “The inflation rate is 
five percent or less”, or “INFL < 0.05”. 

A variable that appears in the invertex of an edge represents an input to 
the model represented by the edge. However, a propositional statement that 
appears in the invertex of an edge does not represent an input but rather an 
assumption that must be true for the model to be valid. Eor instance, p = 
‘INFL < 0.05” is in the invertex of an edge e representing a model, then the 
model only applies when the inflation rate is less than 5%. When a proposi- 
tion appears in the outvertex of an edge, the edge represents a procedure for 
determining whether the proposition is true or false. Eor example, an edge 
{{PRl}, {p}) where p = “PRI < 10” represents a procedure for evaluating the 
proposition “PRI < 10” from the value of PRI. 

Conditional metagraphs must meet two constraints in addition to the re- 
quirement that the generating set be partitioned into variables and proposi- 
tions. Eirst, for each edge, at least one of the vertices must be nonempty, and 
the invertex and outvertex of each edge must be disjoint. Second, if an outver- 
tex contains a proposition, then it cannot contain any other element. Even with 
these constraints, a conditional metagraph is a generalization of the type of 
metagraph discussed above. That is, these previously defined metagraphs can 
be viewed as conditional metagraphs in which Xp = 0. 

The constraints on conditional metagraphs are illustrated in Eigure 7.18, 
which represents a price-volume relationship. In Eigure 7.18 (parts (a), (b), 
(c)) the assumption p depends on an input variable, two variables not in the 
input (i.e., inflation rate and a price index), and on no variable at all, respec- 
tively. In the last case (Eigure 7.18(c)), which is called a pure input (as it de- 
pends on no other elements), the user would be asked whether the proposition 
p is true. In Eigure 7.18(d) the validity of the model depends on the volume 
which is not allowed, since volume is an output of the model. Einally, the edge 
in Eigure 7. 1 8(e) is invalid because two propositions occupy a single outvertex. 

The use of propositions to represent assumptions is reasonable, since it al- 
lows the representation of any assumption that can be stated as a declarative 
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(a) Valid edges (b) Valid edges 





(c) Valid edge 

Figure 7.18. Valid and invalid edges containing propositions. 

sentence, and this is true of most assumptions in modeling. The simplicity of 
the propositional representation, facilitates the inclusion of assumptions in a 
metagraph representation of a model base and allows a variety of important 
model management issues to he addressed: also, the use of metagraph edges 
to, evaluate propositions is quite general. This hlack hox representation implies 
that the procedure to evaluate the proposition can have any structure, such as a 
simple calculation; an estimation model, or even a rule (in the case of an edge 
whose invertex consists of only propositions). While metagraph edges can he 
used to represent rules and rule based inference, modeling inference is outside 
the scope of this chapter. Thus, given an edge with a proposition as its outver- 
tex, we simply use this to mean that given (appropriate) values of the invertex 
elements, the outvertex proposition can be evaluated. 

A conditional metapath establishes a relationship between two sets of vari- 
ables, using whatever assumptions are necessary to execute the necessary 
edges. The relevant assumptions are those that appear in the metapath. The 
propositions for each of the assumptions appear in the invertex of at least one 
of the edges in the conditional metapath and it may appear in one of the out- 
vertices as well. If a proposition does not appear in any of the outvertices, it is 
a member of the set of initial assumptions; otherwise, it is a member of the set 
of intermediate assumptions. 

The set of initial assumptions can be evaluated before any of the edges 
in the conditional metapath are executed. The intermediate assumptions are 
evaluated based on some of the outvertex elements of edges in the metapath, 
once the values of those elements are known. The truth falsity of any, other 
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assumptions - that is, any assumptions not in the relevant set - will have no 
impact on the effectiveness of the conditional metapath in linking the source 
elements to the target elements. 

In order to illustrate these concepts, we present a simple example: consider 
the metagraph in Figure 7. 19 and the conditional metagraph, in Figure 7.20. In 
this example, the variables are Xp = [ADV, CAP, CC, ECON, EQT, EXP, NI, 
PRI, REV, STK, UCOST, VOL] (these variables are defined in the caption of 
Figure 7.19). The propositions are Xp = {adl,cadv,inf,mkt,pkv, 
vdsk}, where adl means “advertising expense per unit is less than 25 percent of 
unit cost”, cadv means “competitive advertising does not increase more than 
20 percent”, inf means “inflation is less than 10 percent”, mkt means “market 
conditions are stable”, pkv means “peak volume does not exceed 3 million 
units”, and vdsk means “there are no volume discounts”. The set of edges is 
E = {e\, 62 , ■ ■ ■ , eg}, where e\ is a pricing model for computing price when- 
ever inflation is less than 10 percent, 62 is a sales forecasting model that calcu- 
lates sales when competitive advertising increases by no more than 20 percent, 
63 computes is a revenue forecasting model when peak volume is: no more 
than 3 million units and no volume discounts apply; 64 is an accounting model 
that calculates total expense when advertising expenses are less than 25 per- 
cent of unit cost, 65 computes net income, ee is a financial model computing 
cost of capital and stock price under stable market conditions, e? calculates 
unit cost, eg determines whether peak volume exceeds 3 million units, and eg 
determines whether adv is true. 

If we consider the conditional metapath M({ADV, CAP, ECON, UCOST}, 
[Nl}) = { 61 , 62 , 63 , e 4 , es, eg, eg} from {ADV, CAP, ECON, UCOST} to {Nl}, 
then we have a = {adl, cadv, inf , pkv, vdsk], f = {cadv, inf, vdsk}, and y = 
{adl, pkv}. That is, all of the assumptions except for mkt must be true for the 
metapath to represent a valid integrated model. However, only pkv and adl are 
intermediate assumptions that must be evaluated using eg and eg; the others 
do not depend on any variables calculated during the execution of the edges in 
the metapath, and therefore can be evaluated at the outset. 

Note that the interpretation of a metapath as a specification of an integrated 
model with the source elements as inputs and the target elements as outputs is 
slightly modified in the case of a conditional metagraph. In a simple metagraph 
(where Xp = 0), given the values of the source elements, the integrated model 
represented by a metapath M(B,C) unconditionally computes values for the 
target elements. However, in a conditional metagraph, just as the validity of a 
single edge is conditional upon the satisfiability of its assumptions, similarly, 
the validity of a conditional metapath as an integrated model is conditional 
upon the satisfiability of all the applicable assumptions in its edges. 

When multiple assumptions are relevant to a problem instance, one question 
that arises is whether the assumptions are mutually consistent. For example, if 




Metagraphs in Model Management 



89 




Figure 7.19. Model base metagraph: ADV - annual ad level; CAP - prod, capacity; CC - cost 
of capital; ECON - econ. indicator; EQT - total equity; EXP - total expense; NI - net income; 
PRI - sales price; REV - annual revenue; STK - stock price; UCOST - unit cost; VOL - vector 
of monthly sales. 



two assumptions occurring in a metapath are p\ : “the inflation rate is less than 
7 percent” and p 2 '. “the inflation rate is at least 10 percent”, then clearly there is 
no feasible context in which both of these assumptions could be true. There is a 
substantial literature on consistency and integrity checking in knowledge bases 
(e.g., de Kleer, 1986, Grant and Minker, 1991, Illarramendi, Blanco and Goni, 
1994, and relatively simple procedures for consistency checking in proposi- 
tional knowledge bases. However, since we do not use negated propositions in 
our approach, the knowledge base is implicitly consistent. Thus, recognition 
that Pi and p 2 above are inconsistent would require manual reasoning. Toward 
this end, detection of the potential “conflict set”, of propositions in a metapath 
can be done by identifying the set C) ^e) G Xp. 

We also introduce the concept of a critical assumption. If there are two or 
more conditional metapaths from a given source to a given target, then the 
edges to be executed depend upon the particular metapath selected. In addi- 
tion, the assumptions to be invoked may also depend upon the choice of meta- 
path. Thus, the user may consider two factors in deciding how to calculate the 
target elements. The first is the set of models (edges) to be used, and the sec- 
ond is the set of applicable assumptions (and possibly even the issue of which 
assumptions are initial and which are intermediate). We define the set of crit- 
ical assumptions for a given source and target as the intersection of the sets 
of relevant assumptions for all metapaths connecting the source to the target. 
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Figure 7.20. Conditional metagraph for model base in Figure 7.7: adl - ADV < 25% of 
UCOST\ inf - inflation < 10%; pkv - peak volume < 3 MM; cadv - competitor’s ad exp. 
increase < 20%; mkt - market conditions stable; vdsk - no volume discounts. 



Thus, the critical assumptions are those that must hold if the available model 
base is to be used to compute the target elements given the source elements. 

So far, we have discussed the use of metagraphs to support manipulation of 
models in a “flat” collection of models. However, in situations where the model 
base is quite large, it is useful to extract relevant views of the model base. In the 
metagraph representation of a model base, such views can be defined in two 
ways, as described in this section. We first discuss the concept of a context, and 
the use of assumptions in metagraphs to define contexts for problem solving. 
We then examine the role of projections, which provide a second mechanism 
for defining specialized views of a metagraph, and the use of projections in 
conditional metagraphs. Then we present a useful result about the relationship 
between these two types of views. 

To define a context in a conditional metagraph, one begins by partition- 
ing the assumptions in the metagraph into three sets: those known to be true, 
those known to be false, and those whose truth values are unknown. Then, the 
conditional metagraph is simplified so that only the last set of assumptions is 
present. 
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Figure 7.21. Context K {{inf,pkv}, {cadv, mkt), S). 



In transforming a conditional metagraph into a context, the assumptions 
whose truth values are known (i.e., P U 2) no longer appear and need not he 
considered in the model selection process. Thus, a context represents a sim- 
plified view of a model base that allows a user to consider only those models 
known to be relevant, and those variables and assumptions whose values can 
be manipulated (e.g., in a sensitivity analysis). The larger the sets P and Q 
(i.e., the more specific the context), the simpler is the resulting conditional 
metagraph. 

Consider again the conditional metagraph in Figure 7.19. If we know that 
propositions inf and pkv are true, and that cadv and mkt are false, then the 
resulting context K({inf,pkv}, [cadv , mkt} , S) is the conditional metagraph 
shown in Figure 7.21. Since vdsk and adl are the only propositions whose 
truth values are not specified in the context, they are the only assumptions 
appearing in Figure 7.21. 

Note that elimination of models from a context does not mean that these 
models are removed from the model base, since a context is only a view. 
Rather, these models are not considered for problems defined within that con- 
text. 

Model management systems, like data management systems, are often used 
by several people or groups, each of whom has a different purpose in mind. 
Thus it may be convenient, or even necessary, to present each user or user 
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Figure 7.22. N({ADV, ECON, UCOST, VOL), S). 

group with a specialized view of the model base. This has two advantages. 
The first is convenience: the users are not burdened with information about 
models that they do not need to arrive at decisions or judgements that concern 
them. The second is security: if certain users should not be granted access to 
certain models, it is better that they do not even know that those models exist. 

We note two interesting features of conditional metagraphs. First, the vari- 
ables in the projection are limited to those in X', and any variables in X\X' 
will not appear. Some or all of the assumptions in Xp may appear in the projec- 
tion; the user only specifies X', and fhe necessary assumpfions are defermined 
by the definition. Second, the projection (and therefore its adjacency matrix) 
represents all relationships among the variables in X' and = 0 iff a ■ ■ = 0 
(where A' is the adjacency matrix of the projection). 

The first observation implies that the decision maker needs only to specify 
the relevant variables over which the projection is desired, and the operation 
then generates the relevant assumptions for each projected relationship. The 
second observation implies that use of projections does not require the com- 
putation of the closure of A' (i.e.. A'*), which saves some computational effort. 
The projection itself can be computed using the A* matrix of the underlying 
metagraph. Although the complexity of the procedure is exponential in the 
number of triples in the relevant portion of the A* matrix, the size of this por- 
tion depends upon the projection set (the elements in the generating set that 
define the projection). Since in practice this set is not large (otherwise, the 
benefit of constructing the projection is lost), the procedure is still practical. 

Consider again the conditional metagraph illustrated in Figure 7.19. If 
we project this metagraph over the set of variables X' = {ADV, ECON, NI, 
UCOST, VOL} c Xv, the result is the conditional metagraph illustrated in Fig- 
ure 7.22. Two propositions, cadv and mkt, do not appear in the projection. The 
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Figure 7.23. N {{ADV , ECON , NI , UCOST, VOL], K(linf,kv}, {cadv,mkt}, S)) = K{[inf,pkv], 
{cadv, mkt], N([ADV, ECON,NI, UCOST, VOL], S)). 



reason is that cadv appears only in the invertex of C 2 and mkt appears only 
in the invertex of eg, and neither of these edges is a memher of a dominant 
metapath corresponding to the edge e" in the projection. 

We illustrate this commutativity property using our earlier example and the 
context and projection on it, shown in Figures 7.21 and 7.22, respectively. 
It should he easy to see that the projection of the conditional metagraph 
in Figure 7.21 over X' = {ADV, ECON, NI, UCOST, VOL} is the conditional 
metagraph in Figure 7.23, which is also the result of defining the context 
P = {inf,pkv], Q = [cadv, mkt} for the metagraph in Figure 7.22. 

We have seen that metagraphs, as a tool for model management, can he 
extended to incorporate assumptions about models. We have also defined a 
confexf as a view of a model base, and have compared if fo anofher fype of 
view, a projecfion. We now examine how fhese ideas can be applied fo cerfain 
quesfions fhaf mighf arise abouf assumptions and fheir role in model manage- 
menf, and illusfrale fhese applicafions using fhe example in Figure 7.19. 

The firsl question is: given a model base represenfed by a condifional mefa- 
graph S, a source sef of variables B whose values are known, a largel sef of 
variables C whose values are fo be compufed, a sef of proposifions P fhaf are 
known fo be frue and a sef of propositions Q fhaf are known fo be false, can we 
defermine fhe values of fhe variables C? This problem can be formulaled as: 
is fhere a mefapafh from 5 fo C in fhe confexf K(P, Q, S)7 Equivalenfly, we 
can ask, if we projecf K (P, Q, S) over (BUC), is fhere an edge (B', C) (wifh 
B' c 5) in fhe projecfion? Since confexf and projecfion are commufafive, fhis 
is fhe same as asking whefher, if we projecf S over BUC and fhen consfrucf 
fhe confexf of fhe projecfion over (P, Q), fhe resulting mefagraph has an edge 
(B', C). The solution fo fhis problem consisfs of fwo steps, (1) consfrucfion 
of fhe confexf mefagraph K (and ifs closure A*), and (2) finding a mefapafh 
M(B,C) in K. For example, in Figure 7.21, fhis analysis can be used fo find 




94 



A. Basu and R. W. Blanning 



that given ECON, UCOST, ADV, and VOL, it is possible to compute NI in the 
context K. 

A second, related question is, given the same information, if there is no such 
metapath, how can we modify the problem (without modifying the model base) 
so that the target elements can be calculated? Three possible modifications that 
can be pursued are as follows: 

1. Expand the source set B by determining values for some additional 
variables. An analysis along these lines can help determine whether it 
is worth the additional effort needed to evaluate those additional vari- 
ables. For example, in Figure 7.21 a failed search for a metapath from 
ADV, ECON, and UCOST to NI can show (through the identification of 
coinputs of candidate triples in Ap that knowledge of VOL and satisfac- 
tion of vdsk will yield a valid metapath. 

2. Identify those variables in C that are not computable given B, and con- 
sider the possibility of removing them from the target set C (for in- 
stance, if they are only marginally useful or not critical for the deci- 
sion problem at hand). Such variables can be identified as a by-product 
of the initial procedure for searching for a metapath from B to C us- 
ing A*. For example, in Figure 7.21, if B = {ECON, UCOST, VOL] and 
C = [EXP, REV}, then no metapath exists; however, if we remove EXP 
from C, then there is a metapath. 

3. Modify the context, by removing one or more assumptions from Q, so 
that certain models earlier invalid can now be considered. This can be 
achieved by incrementally removing assumptions and sets of assump- 
tions from Q and repeating the metapath search procedure described 
earlier. For example, if B = [ADV , ECON , EQT , UCOST, VOL], and 
C = [STK], then there is no metapath M(B, C) in the context of Fig- 
ure 7.21. However, if mkt is removed from Q, then a metapath can in- 
deed be found. 

A third question is as follows: given a model base, source and target sets 
B, C respectively, what assumptions are necessary for computing C? This 
problem can be formulated as the identification of the set of assumptions that 
cannot be in Q for a conditional metapath M(B, C) to exist in K(0, Q, S). 
This is a useful question in practice, since it identifies those assumptions that 
are critical for the given decision problem. In order to solve this, once again the 
adjacency matrix of the conditional metagraph can be exploited. For example, 
in Figure 7.20, the propositions adl, inf,pkv and vdsk are critical (i.e., cannot 
be in Q) for a metapath M({ADV, ECON, UCOST, VOL], [NI]) to exist. 

A fourth question, related to the previous one, is: given S,B,C as before, 
and a specific proposition p, is pa necessary assumption for C to be calcu- 
lated from B? This is a more specific problem than the identification of all 
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necessary assumptions, and can be addressed using a simpler procedure. For 
example, in Figure 7. 18, if S = {EQT, EXP, PRI, VOL}, and C = {57/:}, then 
the procedure could be used to show that vdsk is required (i.e., vdsk cannot be 
in Q) while adl is not essential (i.e., adl can be in Q) for a metapath M{B, C) 
to exist. 

Thus, a number of important questions regarding assumptions and models 
can be formulated as questions of connectivity in conditional metagraphs, 
or in specific views of such metagraphs. Furthermore, these problems can be 
solved by visual inspection of the pictorial representation of the metagraph 
if the model base (or at least the relevant portion of it) is small, as well as 
through structured procedures on the algebraic representation of the metagraph 
in situations which are more complex. These procedures can also be used as 
a basis of sensitivity analysis, by analyzing the effect of changing the truth 
value of one or more propositions. This implies reapplying the procedures 
with modified contexts. 

It should be noted that this is not the only type of sensitivity analysis used in 
modeling. Another involves changing values of one or more inputs (metapath 
source variables) to the decision problem, and studying the resulting impact on 
the model outcomes (metapath target variables). Unlike the assumption-based 
analysis described above, this requires actual execution of the models them- 
selves, with different values for the input variables. During this type of analy- 
sis, an intermediate assumption that was true for one set of inputs may become 
false for another. Metagraphs are also useful for such analyses. For instance, 
for each intermediate assumption in a metapath, one can determine which in- 
put variables might change the truth value of that assumption. That is, in the 
A* matrix, given an input variable, and an intermediate assumption x G Ay, if 
there is a triple t e a*^ (and therefore, a simple path from x to p), such that the 
edges in that path are in the metapath, then p can be affected by changes to the 
value of x; otherwise, p is not affected by x. For example, in Figure 7.20, we 
can determine that the input variable UCOST does not affect the intermediate 
assumption pkv, and thus changes to UCOST will not affect the validity of 
the metapath (ei, . . . , e^, e^, eg} from {ADV, ECON, UCOST, VOL} to {Nl}. 
On the other hand, the input variable ADV can change the truth value of pkv 
through a change in VOL, and thus a change in the value of ADV may result in 
this metapath becoming invalid. 




Chapter 8 

METAGRAPHS IN DATA AND RULE 
MANAGEMENT 



We now extend the results of Chapter 7 to encompass two additional informa- 
tion structures. The first is data bases, in which each edge represents a data 
relation with the key attributes as invertex and content attributes as outvertex. 
The second information structure is rule bases in which each edge represents 
a production rule with the antecedent (as a conjunction of propositions) as in- 
vertex and the consequent (also as a conjunction of propositions) as outvertex. 

A simple example illustrating the use of metagraphs in representing mod- 
els, data files, and rules is diagrammed in Figure 8.1. Figure 8.1(a) illustrates 
a model constrained by a rule. Edge e\ is a model that calculates Profit in the 
outvertex using Sales and COGS (cost of goods sold) in the invertex. How- 





(a) Model Constrained by a Rule (b) Data Relation with Integrity Constraint 




(c) Rule Instantiated by a Model 



Figure 8.1. Interactions between models, data, and rules. 
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ever, this model is valid only if proposition p 2 is valid. This proposition states 
simply that the model is valid. It is calculated from another proposition p\, 
which might state, for example, that “Sales < 1000”, and the edge is used 
to calculate pi from the value of Sales. Edge e 2 is used to calculate p 2 from 
Pi - that is, it states that if Sales < 1000 then the model is valid. 

We note that the rule represented hy the edge e 2 does not tell us whether 
the model is invalid if Sales exceeds 1000. That is, it does not state that the 
model is valid if and only if Sales < 1000 (i.e., pi = p 2 ). At present, we are 
discussing only positive propositions. However, we will hriefly discuss com- 
plementary propositions in Section 3 of this chapter. 

Figure 8.1(h) illustrates a data relation with an integrity constraint. The edge 
e\ represents the data relation with SSN (social security number) in the inver- 
tex as key attribute and Name, Rank, and Salary in the outvertex as content 
attributes. The constraint is that “IF Rank < 5, THEN Salary < $20,000”. 
Edge e 2 represents this implication - that is, it states pi p 2 , in which 
Pi is “Rank < 5” and p 2 is “Salary < $20,000”. The constraint states that 
if Rank < 5, then Salary < $20,000; it does not state what will happen if 
Rank > 5. Edge e^ calculates proposition as a function of Rank, and edge 
e 4 calculates p 2 as a function of Salary. This example shows how the data 
values returned by a database query can be validated using any integrity con- 
straints imposed upon them. The values of Rank and Salary corresponding to 
a given value of SSN in the database table represented hy ei are valid only if 
they satisfy the constraint represented by the rule e 2 . 

The third example includes models, data, and rules. In this case, a model 
accesses a data relation to perform a calculation which is then used to instan- 
tiate a rule. This is illustrated in Figure 8.1(c). The edge ei represents a model 
that uses Sales and COGS (Cost of Goods Sold) to calculate Taxes and Profit. 
Edge 63 is a rule that uses Profit to determine if proposition pi is true; pi is 
“Profit > $100,000”. Edge 64 is a rule that uses DE Ratio (debt/equity ratio) to 
determine if proposition p 2 is true; p 2 is DE Ratio < 1. Edge 62 represents the 
rule Pi A p 2 ^ P 3 - that is, “IF Profit > $100,000 AND DE Ratio < 1 THEN 
the company has an acceptably high rating”. Edge e$ presents an alternative 
method for determining if p 2 is true - that is, the acceptability of the rating can 
be determined directly from the rating itself. This example shows how rule 62 
can be used if the model ei is first used to determine a value for Profit, which 
is in turn used to evaluate pi , one of the antecedents of the rule. 

We can see that metagraphs may be used to model the three principal types 
of relationship found in decision support systems. These are data relations 
(in which case the elements in the generating set are data attributes), decision 
models (in which case the elements in the generating set are decision and other 
variables), and logical rules/constraints (in which case the elements in the gen- 
erating set are Boolean variables or propositions). The purpose of this chapter 
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is to investigate these topics in more detail. We will do so by examining three 
topics. The first is the representation of rule bases as metagraphs in which the 
elements in the metagraph correspond to propositions in the rule base. We will 
see that in an acyclic metagraph the existence of a metapath connecting two 
sets of elements is equivalent to the existence of an inference path connecting 
the corresponding propositions. The second topic is the use of metagraphs in 
integrating models, rules, and data. The third topic is the use of metagraphs in 
uncovering implicit integrity constraints in rule bases. This is done in Sections 
1 , 2 and 3 below. 

1. REPRESENTING RULE BASES AS METAGRAPHS 

We begin by examining the ways in which the connectivity properties (i.e., 
metapaths) and algebraic properties (i.e., the A* matrix) of metagraphs can 
be useful in precompiling rule bases for efficient query processing. In this 
section, we will consider knowledge bases consisting only of rules, and in the 
following section we will consider the integration of rule bases with data and 
model bases. 

When a metagraph is used to represent rule bases, each element in the gen- 
erating set represents a proposition - that is, a variable that can take on either 
of two values, true or false. Each edge represents a rule in which the inver- 
tex is the antecedent to the rule and the outvertex is the consequent of the 
rule. The propositions in the antecedent are combined conjunctively as are the 
propositions in the consequent. Thus, a rule might be “IF the account balance 
is negative AND the amount is greater than $1000 THEN send a dunning no- 
tice to the account AND notify the credit department”. If pi, . . . , p 4 represent 
these propositions (e.g., pi is true iff the account balance is negative), then the 
rule would be written p\ A p 2 ->- P 3 A p 4 . More generally, we define a rule 
base as follows: 

Definition 8.1. A rule base is an ordered pair T = {P, R) in which P is 
a sef of proposifions {p, , / = 1, . . . , /} and R is a sef of rules R = {r^, k = 
1, . . . , wifh each rule being an expression of fhe form 

A A 

peYk qeZk 



wifh Yk ^ P and Zk Q P . Yk is fhe anfecedenf of rk and Zk is fhe consequenf. 

We note fhaf fhe rule base defined here can be used fo represenf only Horn 
clause rules, and fhaf fhis limifafion applies fo all fhe resulfs in fhis paper. 
Thus, we cannof represenf rules of fhe form Apey*. P \/ qeZi ‘i- However, 
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Figure 8.2. Metagraph representation of a rule base. 

Horn clause logic has proven quite useful in rule based systems, and is not a 
serious practical limitation. The rule base defined above can be represented as 
a metagraph in which the following pairs are isomorphic: pi and x,, P and 
X, rk and ek, R and E, Yk and Vk, and Zk and Wk- 

Definition 8.2. Given a generating set X, a metagraph S = {X, E) on X 
with E = {ek, k = 1, . . . , A'} and a rule base T = (P, R) with |Z| = |P| = / 
and I £ I = I I = A' then S corresponds to T if for any 1 < / < / and 1 < /: < 
K, (1) xi G Vk iff Pi e Yk, and (2) x, G Wk iff Pi e Zk. If S corresponds to T, 
then X corresponds to P and E corresponds to R. 

In the remainder of this chapter, we will use metagraph notation to refer to 
both metagraphs and rule bases. Thus, the metagraph of Figure 8.2 represents 
a rule base consisting of two rules: x\ ^ X 2 A X 3 and X 3 A X 4 ^ X 5 . We now 
establish a correspondence between metapaths and valid inferences. 

Theorem 8.1. Let X = {x,-,/ = 1,...,A^} be a generating set and S = 
{X, E) be a metagraph on X corresponding to a rule base T = {P, R). Given 
two nonempty disjoint sets of elements X\, X 2 C X, corresponding to two sets 
of propositions P\, PiG P, there is an acyclic metapath M from to X 2 in 
S if and only if the following implication is valid in T : 

A A 

pePi qeP2 



Proof. (IF) Assume that there is a metapath M from 5 to C. Without loss of 
generality, assume that \ Wk\ = 1, e M. Now consider the following proce- 
dure: 

Procedure Proof from MP{Xi, X 2 , M) 



Let Mo = M,Xq = X 2 \G = 9 




Metagraphs in Data and Rule Management 



101 



While Xo = 0 , DO 

Step 1. Find R c Mq such that Xq = ^k- 

Step 2. G = G U {{x, y) \ {eu e R) A (xVk) a(Y e Wk)}. 

Step 3. Xo = U,6«'4\{^iUXo}. 

Step 4, Mo = Mo\R. 

Repeat; 

End. 

We prove the result hy showing that the above procedure always terminates 
cessfully with a proof tree for X2 from X\. First, hy definition of a metapath, 
we know that R can he found in step 1 in the first iteration. It follows then that 
X2 can he inferred using R from the set of elements Uej-eS ^ identifies 

all leaf nodes in G that are not part of Xi, and then extends the proof tree 
backward from these elements. Since the metagraph is acyclic = 1 for all 
ek e M, the rules considered for these must be distinct from the earlier set R. 
Also, since M is finite and in each iteration the candidate edge set is reduced, 
the procedure is guaranteed to terminate. The only possibility of unsuccessful 
termination is if Zq = 0 in step 3 and Mq = 0 in step 4 in the same iteration. 
However, this is impossible, since we know by definition that 

U Vk\ U Wk^Xi 

ek&M e^eM 

and thus, Zq is made up of elements that have not been considered so far, 
and which, not being in Zi , must be outputs of rules in M that have not been 
considered (i.e., are in Mq). Thus, when the procedure terminates, it always 
yields a proof tree for Z2 in which all leaf nodes are elements of Zi, which is 
the desired result. 

(ONLY IF) Assume that Z2 can be inferred from Zi. A set of elements Z2 
can be inferred from another set of elements X\ using a set of rules E if there 
is a proof tree whose non-leaf nodes are a superset of Z2, whose leaf nodes are 
a subset of Zi, and all of whose edges correspond to rules in E such that each 
non-leaf node x is the consequent of a rule whose antecedents are the children 
of X in the proof tree. 

Since the metagraph is acyclic, every directed path to a non-leaf node can 
be extended to a leaf node that is part of Zi. Since every edge in the proof 
tree is based on an edge in the metagraph, if we consider the metagraph edges 
corresponding to all the edges in the proof, we get a set M such that every 
edge is on a path from X\ to Z2. Furthermore, since all the leaf nodes are 
within X\ and all elements of Z2 are non-leaf nodes, Z2 c Wk and 

Uej-eM ^k\ ^k ^ X\, SO that M is indeed a metapath from X\ to Z2. 

Thus, in Figure 8.2 we can infer that xi A X4 ^ X5 because of the existence 
of a metapath M({xi, X4}, {X5}). Similarly, in the metagraph of Figure 8.3 




102 



A. Basu and R. W. Blanning 




Figure 8.3. Metagraph with acyclic metapath. 




Figure 8.4. Cyclic metagraph. 

we can infer xi ^ because of the existence of a metapath M ({x/}, {xe}). 
On the other hand, in the cyclic metagraph of Figure 8.4 there is a metapath 
M({xi}, {x2, X3, X4}), but we cannot infer xi ^ X2 A X3 A X4. For example, if 
we have xi true and X 2 ,X 3 ,X 4 false, then the rules corresponding to e\ (i.e., 
xi A X2 ^ X3) and 62 (i.e., X3 ^ X2 A X4) are both true, but xi ^ X2 A X3 A X4 
is false. □ 

We now demonstrate that the construction of A* can be useful in deter- 
mining whether a set of propositions (denoted 5 c X) is sufficient to infer 
a second set of propositions (denoted C c X). This can be accomplished by 
means of a localized search involving only those triples in A* contained in 
members af- for which x,- e B and x; e C. 

Theorem 8 . 2 . Let X be a generating set and S = (X, E) be a metagraph on 
X. Given two nonempty disjoint sets of elements B,C ^ X and the set of all 
simple paths 9 = {hi, h 2 , . . . , hg} from any element x e B to some y & C, if 
M is a metapath from B to y, then 3// G 2®, where 2® is the power set of 6, 
such that M = Set{H). 

Proof. By definition, every edge ei in a metapath M(B, C) must lie on a 
simple path from some element x G B to some y & C, where Set(hi) c 
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That is, M is made up of edges that comprise a set of simple 
paths Hi = {/ij, from elements in B to elements in C (i.e., M = 

Set(h'-)). Clearly, since 0 is the set of all such paths, H\ 0, and the 
results follow. □ 

Thus, the search for a metapath from B to C can he limited to unions of 
the edges in simple paths from elements in B to elements in C. All such paths 
are contained in the triples contained in the members a*j for which x, e B and 
Xj e C. The following theorem simplifies this task even further. 

Theorem 8.3. Given q paths h\, ... ,hq from a set of elements B to some 
set of elements C, let ai = ^ (invertices on ith path) and ft = [J (outvertices 
on i th path). If 

q q 

\Jon\ljfh<zB, 

i=i i=i 

then Uf=i Set (hi) is a metapath from B to C. 

Proof. Follows from the definition of a metapath. □ 

For example, consider the metagraph of Figure 8.3 and it’s adjacency ma- 
trix in Figure 8.5. From the closure A* in Figure 8.6 we can determine whether 
x\ ^ X6 hy examining the triples in Q!*g. There are two such triples, corre- 
sponding to {e\, e 2 , ef) and {e\, ej,, ef), with (a, f) components as follows: 

First triple: a\ = {X3}, = {x2, X3, X4}. 

Second triple: «2 = {x a], ^2 = {x 2 , X 2 ,X 5 }. 

Since (a\ U f 2 )\(f\ U ^ 2 ) = 0 e {xi}, we have x\ X(,. 

In this section, we have assumed that the metagraphs under considera- 
tion are acyclic. We note that a metapath constructed from acyclic sim- 
ple paths need not he acyclic. Consider the metagraph of Figure 8.7, and 
let M({x\], {x4, X5}) = Set(hi) U Set(h 2 ), where hi(x\,x$) = {e\, e 2 , 03 ) and 
h 2 (xi,X 4 ) = {e 4 ,e$,e(i). Both of these paths are acyclic; hut the meta- 
path, which consists of all six edges, is cyclic. Even so, we can prove that 
xi ^ X4 A X5, since there is an acyclic metapath {e\, 03 , 04 , e^} from {xi} to 

{X4,X5}. 

These results may he very helpful in the processing of large rule bases for 
which there are likely to be a variety of queries, since A* provides a precom- 
pilation of the rule base. Of course, the computational effort needed to answer 
any query will depend on the number of triples in the intersection members of 
A* , but in rule bases where the number of paths between any two elements is 
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Figure 8.5. The adjacency matrix of the metagraph in Figure 8.3. 
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Figure 8.6. The closure of the adjacency matrix in Figure 8.5. 




Figure 8.7. Metagraph with cyclic and acyclic metapaths. 



not very large, precompilation of A* may facilitate localized search for solu- 
tions to inference problems. 
















































Metagraphs in Data and Rule Management 



105 



2. INTEGRATING RULES, MODELS AND DATA 

We have already discussed the need to support different resources such as 
models, rules and data in a DSS. In this section, we discuss how metagraph 
representation of these resources can he useful both during DSS design and 
use. To facilitate our discussion, we use the term “knowledge base” to describe 
the structure containing all these resources in the DSS. In other words, the 
knowledge base is the combination of the model base, rule base and data base 
contained in a DSS. 

We start by defining a metagraph for a DSS knowledge base. The generating 
set X for such a metagraph consists of two types of elements. That is, X = 
{Xp U Xv}, where Xp is a set of propositions, and X^ is a set of variables. 
The basic difference between a variable and a proposition is that the latter is 
a logical variable restricted to a truth value (in this paper, we will assume a 
Boolean truth value), while the value space for a variable can be any arbitrary 
set. 

The edges of the metagraph then correspond to the different resource mod- 
ules, namely models, rules and data. In the case of a model edge, the invertex 
consists of the inputs and assumptions, while the outvertex identifies the out- 
put variables. A data edge represents a functional dependency between the 
invertex elements (key) and the outvertex elements (non-key /content). A num- 
ber of significant classes of edges can be characterized, based on the above 
partitioning of the generating set: 

1 . If V (e) c Xp and W (e) c Xp , then e is a rule. 

2. If V (e) c Ay and W{e) c Xp, then e is a proposition definition. 

3. If V(e) c Xv and W (e) c Xp, then e is an unconstrained model or data 

relation. 

4. If V (e) c Xp and W (e) c Xp , then e is an equality predicate. 

5. If V(e) ^ X and W(e) c Xp, then e is a constrained model or data 
relation with domain constraints. 

In order to understand the significance of the above characterizations, we 
present the following observations: 

• The classification is not meant to be exhaustive. For instance, it is quite 
possible for an edge representing a model to have both variables and 
propositions in either or both vertices. The primary objective of the 
classification is to identify certain classes that may be significant in 
metagraph-based analysis during system design and use. 

• Even though it is possible to design modules that include features of sev- 
eral of the above classes, it is a good idea to design modules such that 
each module belongs to one of these classes. The reason for this is that 
it facilitates easier implementation of modules. For instance, if rules are 
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constrained to propositions only, then they can he implemented hy a sim- 
pler knowledge representation construct, and used with a simpler infer- 
ence engine. If, during requirements determination, modules are identi- 
fied that combine features of multiple classes, it may often he desirable 
to decompose such modules into sets of simpler modules that do con- 
form to the classes. The functionality of the original complex module can 
then be recreated through appropriate module integration during problem 
solving. 

• Even though the different types of modules are represented using a com- 
mon construct, a metagraph edge, their semantics are quite different. For 
instance, rule edges are logical implications; thus, the truth value of the 
outvertex propositions (consequent) is determined only when the invertex 
propositions (antecedent) are all true. In general, whenever one or more 
propositions occur in the invertex of an edge, the relationship underlying 
the edge is interpreted as being conditional upon these propositions being 
true. For instance, in the case of a constrained model edge, the underlying 
model can be used to compute the outvertex variables from the invertex 
variables, as long as the propositions in the invertex are true. Similarly, 
for a range restricted data relation, the corresponding edge represents a 
functional dependency, which can be used as an integrity constraint upon 
the database. That is, the functional dependency can be used to validate 
updates, and also to construct data access plans in query processing. Note 
that the functional dependency can be used to access data only when the 
specified invertex variables are assigned values that correspond to exist- 
ing attribute values in the current extensional database. 

• An equality predicate is an edge that can be used to assign a value to 
a variable, given the value of one or more propositions. This type of 
edge is likely only when the invertex consists of a single proposition 
that is an assignment statement and the outvertex is a single variable. 
For instance, the proposition p, “the company is of type X” can be 
used to evaluate the value of the variable company type, using the edge 
{{pi}, {company type}). However, the proposition “A > 5” is insufficient 
to compute the value of A (of course, it could be combined with another 
proposition “A < 5” to compute A, but such edges are not likely to occur 
very often). 

We now discuss an application of the metagraph representation. We show 
how a metapath can be used as a basis for identifying collections of resource 
modules that can be used to solve certain problems. 

Consider a situation where a user wants to obtain values for a set C of 
elements (variables and/or propositions), given values for another set B of ele- 
ments. Depending on the available knowledge base, there may be a number of 
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possible solution plans that can achieve this. When the knowledge base con- 
sists of models and data, potentially feasible solution plans can be identified by 
finding metapaths from the set B to the set C in the metagraph representation. 
Also, for a rule base, the search for inference plans for deducing C from B can 
also be formulated as a search for an acyclic metapath from 5 to C. We now 
examine the general case when the knowledge base contains all three types of 
components. 

While the general intuition still holds in this case, that a metapath represents 
a potentially feasible solution plan, the issue of cycles has to be considered. 
Recall that in a rule base, a metapath is usable as a basis for inference only 
when it is acyclic. On the other hand, in a model and/or database, there are 
no such constraints. For instance, the existence of a cycle in a metapath corre- 
sponding to an integrated model (a collection of models for a given B, C pair), 
merely denotes a set of variables that have to be equilibrated through possible 
repetitive iteration through the edges in the cycle. In other words, the exis- 
tence of a cycle in a metapath does not invalidate it in the case of an integrated 
model, but does present problems in the case of a rule-based inference process. 
What then is the situation when the metapath corresponds to a combination of 
rule-based inference, model execution and data access? 

In order to address this question, we need to define some additional terms: 

Definition 8.3. Given a metapath M(B, C) in a metagraph, and an element 
X G A, we say that x is cyclic within M if there is a cycle h{x, x) such that 
Set(h(x, x)) c M. 

Definition 8.4. An element x is said to be used in a metapath M(B, C) if 
3e G M such that x G {V(e) U W(e)}. 

We now have the following result which extends our earlier work, for the 
general case: 

Theorem 8.4. Given a metapath M(B,C) consisting of model, rule and 
data edges, these modules can be used to compute values for the elements 
of C, given specific values for the elements of B, if none of the propositions 
used in M are cyclic within M. 

Proof. It has been shown in Chapter 2 that the computability of a set of vari- 
ables C from a given set of inputs B using a set of data relations and models, 
can be determined by the existence of a metapath from fi to C. Thus, we only 
need to examine the case where the available knowledge base includes rules. 
However, by Theorem 8.1, we know that for a rule base, a set of propositions 
C can be inferred from another set B if there is an acyclic metapath from B 
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to C (i.e., if all the propositions in the metapath are acyclic within it). Thus, 
the result follows. □ 

We can see from Chapter 2 that a metapath consisting of model and data 
edges is sufficient to compute C given B. This follows from the definition of a 
metapath, and is independent of whether the metapath is cyclic or acyclic. The- 
orem 8.4 shows that rule edges can he used in the metapath as well, provided 
that there are no cycles through the propositions in these rules. 

Although the qualification about cyclic propositions in Theorem 8.4 is sig- 
nificant, it can be handled quite conveniently in practice. This is because of the 
A* matrix, which enables the necessary check to be performed very easily, in 
two steps. First, for each proposition used in the metapath, we can determine 
whether it is cyclical by examining the diagonal cell of A* corresponding to 
that proposition. Clearly, if that cell is empty, then the proposition is acyclic. 
Second, if the diagonal cell is non-empty, we can check whether any of the 
triples in that cell corresponds to a path that is contained entirely within M. If 
not, then the proposition is still not cyclic within M, and the latter can be used 
to construct a solution plan. 

We now present an example, to illustrate the use of metagraph representa- 
tions of DSS knowledge bases. Consider a DSS knowledge base consisting of 
the following modules: 

ei : Rev, Trate, Exp ^ NI (this is an accounting model that computes net 
income, given revenue, expenses and the applicable tax rate). 

62 - Pri, Econ ^ Vol (this is a marketing model that determines volume 
demanded, given unit price and the value of an economic indicator). 

63 : Vol ^ Exp (this is a simple accounting model that computes expenses 
as a function of volume of sales). 

64: Pri, Vol, Dis ^ Rev (this is a marketing model that computes revenue 
for any given level of price, volume and discount rate). 

^5- Pi^ P 2 ^ P3 (this is a rule stipulating that the applicable discount rate is 
10% if the item’s unit price is less than or equal to $100 and the volume 
demanded is greater than or equal to 1000 units - i.e., p\ /\ p 2 ^ pj,). 

^6'- P3 ^ Dis (this is an equality predicate that allows the assignment of a 
value, 10%, to the discount rate variable when the corresponding propo- 
sition is true. Note that the converse relationship also holds, that is, 
given a value for the discount rate, the proposition “Dis = 10%” can be 
evaluated as well). 

ej: Vol, Exp ^ Pri (this is a pricing model used to determine the optimal 
price for any given level of volume and expenses). 

eg: Pri ^ p\ (this module defines the proposition “Pri < $100” from the 
variable Pri). 
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Figure 8.8. Metagraph representation of a DSS knowledge base. 



eg: Vol ^ p2 (definition of the proposition “Vol > 1000” from the variable 

Vol). 

The interpretation of the propositions is as shown in Figure 8.8, which 
shows a metagraph representation of the knowledge base. Analysis of this 
metagraph and its adjacency matrix A (and the corresponding closure A*) 
leads to the following conclusions: 

• If price is known, and if net income is the variable that we want to 
evaluate, there are a number of paths from price to net income (e.g., 
(e4, ei), (e2, es, ei), {e%, e$, ee, d4, ei}). In each case, there are additional 
coinputs for which values have to be determined. For instance, in the path 
(e4, e\), the coinputs of price are volume, discount rate and expenses. 

• There is a metapath M\ = {e\, 62, 63, 64, 65, e^, eg, eg} from price and 
economic indicator to net income. In other words, given values of the first 
two variables, we can compute the corresponding value of net income. 
However, this is only true under certain conditions. In particular, this is 
only true when the rule es fires successfully. Thus, in this case, the rule 
in effect establishes the range of application of the metapath as a viable 
solution basis. 

• The metapath M\ is viable, even though there is a cycle within it, con- 
sisting of the path {62, eg). However, this cycle does not invalidate the 
metapath, since the cyclical elements in this case (price and volume) are 
both variables, and not propositions. Thus, the cycle merely indicates that 
the values of price and volume have to be equilibrated (perhaps through 
repeated iteration through the cycle) as part of the solution process. 
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• The set of propositions used in the metapath, whether in rules or as 
constraints in models, serve to define the range of application of the in- 
tegrated model or solution plan represented hy the metapath. In other 
words, we can infer that the metapath is usahle only when prices is be- 
low $100, volume is greater than 1000 units, and the discount rate is 
10 %. Note that only those propositions that occur in the invertices of 
edges have to he considered. Thus, for instance, if the outvertex of in- 
cluded a proposition “sales terms are net 50 ”, this proposition would not 
he a requirement for the metapath M; to he usahle. 

• If the discount rate is also known (and it need not he 10 %), then there is 
also the metapath M2 = {ei, e2> ^3, 64, e?} from price, economic indicator 
and discount rate to net income. Unlike M\, this metapath represents 
an unconstrained model. In other words, it can he used for any values 
of the input variables. However, the rule e$ can still serve a valuable 
purpose when M2 is used, namely as an integrity constraint. Thus, it can 
be used to check whether the values of price, volume and discount rate 
are consistent, in cases where the rule is applicable. Note that in order to 
utilize e$ in this way, we have to use ee in the reverse direction, which is 
acceptable for an equality predicate, as mentioned earlier. 

The above observations imply that metagraph representation of a DSS 
knowledge base facilitates discovery of a variety of useful information. An 
important point that must be stressed here is that although many of these ob- 
servations can be made by visual analysis, which might be practical for small 
knowledge bases, the strength of the metagraph approach lies in the fact that 
these conclusions can also be reached through analysis of the algebraic repre- 
sentation of the metagraph through its A and A* matrices. This is important, 
since it enables a metagraph-based DSS to not only help a decision maker by 
providing an expressive graphical visualization tool, but also provides active 
support through analytical processes that are transparent to the user but still 
provide valuable evaluative information about the available resources and po- 
tential solution plans. 

The use of the adjacency and closure matrices to find metapaths between 
specific sets of elements has been discussed in the previous section. We next 
show how these matrices can also be exploited to find rules that may be ap- 
plicable as integrity constraints on metapaths representing problem solutions. 

To start with, note that any rule such that all its elements are reachable from 
a set of elements is a potential integrity constraint upon that set of elements. 
For example, in Figure 8.8, the rule corresponding to es is a potential integrity 
constraint upon any metapath containing ^4 since the elements pi, P2, P3 in 
e$ are all reachable from the elements of 64. The simplest case is when the 
elements of a rule r are defined in terms of the elements in the metapath M. 
This case can be easily identified as follows: 
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1. For each variable x G Xy occurring in M, find all the propositions 
y G Xp that are reachable from x in A (i.e., such that a^y ^ 0). Let 
this set of propositions be X'^. 

2. For each proposition x G X'^, examine ay^ for any triple t such that 
z G X'p and the coinput of y in t is contained in If any such triple is 
found, then edge{t) is a potential integrity constraint. 

A more general case is where the potential integrity constraint is indirectly 
reachable from the elements in M via a metapath consisting of rules. This 
case can be checked using the metapath search procedure itself, restricting the 
search to metapaths consisting solely of rule edges (i.e., triples for which all 
coinputs and outputs are propositions). 

3. DISCOVERING IMPLICIT INTEGRITY 
CONSTRAINTS 

In the previous sections, we have assumed that all the rules in the rule base 
represented by a metagraph contain only positive propositions. In this section, 
we discuss how information about complementary literals can be used to make 
certain useful transformations in the metagraph representation of a rule base 
that can reveal relevant integrity constraints. 

The issue of integrity maintenance is an important one in the context of 
knowledge based systems and information systems in general. One way in 
which integrity is enforced in a rule based Knowledge Based Systems (KBSs) 
is through the use of integrity constraints. While any sentence can be used as an 
integrity constraint, a common and fairly widely applicable form for integrity 
constraints is a statement of the form: 

->xi V —>X 2 • • • V ->x„ 
which can also be stated in the form 



Xi A X2 ' • • A X/^—i A Xt^-\-\ • • • A Xp^ ^ 'X/; 

for any k between 1 and N . Typically, as part of the definition of the rule base, 
a number of such integrity constraints may be included. These constraints can 
be used not only during the problem solving process to eliminate infeasible so- 
lutions, but also as integrity checks during any updates to the rule base (Grant 
and Minker, 1991). 

In addition to the explicitly stated integrity constraints in the rule base, im- 
plicit constraints may exist in the knowledge base, or may result from deter- 
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minations that some propositions are either complementary or mutually con- 
tradictory. 

Example. Consider the following rule base: {(a ^ p), {a ^ ^)}, where a is 
some conjunction of propositions containing neither p nor q . 

In this rule base, if we impose an explicit integrity constraint (-'p V -^q), we 
find that there is an implicit integrity constraint -^a which the system designer 
or user may not be aware of. 

Discovery of such implicit integrity constraints can be very useful during 
problem solving. Methods for the discovery of such implicit constraints are 
valuable, since they facilitate rule base management. In this section, we show 
that metagraph representation of rules and the corresponding A* matrix can fa- 
cilitate the discovery of implicit integrity constraints. For this purpose, we need 
to define an augmented form of the triples in A*. Given a triple ({a}, {/!}, (h)) 
in af- (where a and fi are sets of elements and h is a. sequence of edges), the 
augmented triple corresponding to this is given by ({a, x, }, {/I, xj], (h)). 

Theorem 8.5. Given a rule base and its corresponding metagraph S, if the 
constraint —>Xi V —>Xj is added, then the following rules of transformation can 
be used to simplify the A* matrix of S (where a, f C X), and x,-, xj G X and 
{x;, xy} n {a U y6} = 0 and where h is a sequence of edges forming a simple 
path): 

1. Any augmented triple of the form ({x;, xy, a}, {f}, (h)) can be deleted', 

2. From any augmented triple of the form ({x; , a}, {xy , f}, (h)) the integrity 
constraint —‘a V — 'X; can be inferred', 

3. From any augmented triple of the form ({a}, {x; ,Xj,f], (h)) the integrity 
constraint —‘a can be inferred. 

Proof. We consider each rule in turn: 

Transformation 1 : Given the integrity constraint, clearly both x, and xy can 
never be both true, so that the antecedent of the rule is never true, and thus the 
rule is useless and can be deleted. 

Transformation 2: Triples of this type correspond to the implication a A 
Xi ^ f A Xj which can be simplified to the two clauses 



0! A X, 






a A Xi ^ Xj . 

The second clause can be resolved with the integrity constraint to infer the 
integrity constraint V ->x,-, which is the desired result. 
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Figure 8.9. Metagraph illustrating integrity constraints. 

Transformation 3 : Triples of this type eorrespond to the implication a 
/I A X; A Xj . However, the given integrity constraint is equivalent to the ex- 
pression —>{xi A Xj). Hence the consequent of the rule must always he false. 
This in turn implies that a must always he false (i.e., ->0!), which is the desired 
result. □ 

The transformations in Theorem 8.5 can he used to eliminate some triples 
(Transformation 1 ) and to extract implicit integrity constraints (Transforma- 
tions 2 and 3 ). We illustrate this in the following example: 

Example. Consider the metagraph in Figure 8.9, it’s a matrix in Figure 8.10, 
and its A* matrix in Figure 8.11. The following separate examples illustrate 
each of the three rules in Theorem 8.5: 

If we introduce the constraint —>x\ V ->X4, then the triple ({xi,X4), {X3}, 
{e\,ej,)) in ^25 can he deleted hy Rule 1 . 

If we introduce the constraint, then the in ->X2 V ->X3, then the integrity 
constraint —>x\ V —>X2 V ->X4 can he inferred hy Rule 2 . 

If we introduce the constraint ->X4 V ->X5, this leads to the constraint —>X2 V 
->X3 hy Rule 3 . 

Furthermore, if we consider not only the simple paths represented hy the 
triples in A*, hut also metapaths, then we get the following additional trans- 
formation: 

Theorem 8.6. Given a rule base, if the constraint —>Xi V —<Xj is added, then 
for any set of elements a such that there is an acyclic metapath in the corre- 
sponding metagraph from a to both x, and xj, the integrity constraint — can 
be inferred. 

Proof. From the previous section, we know that if there is a metapath from 
a to both Xi and xj , then it follows that 



a 



Xi A Xj . 
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Figure 8.10. The A matrix of the integrity constraint metagraph. 
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Figure 8.1 1. The A* matrix of the integrity constraint metagraph. 



However, this corresponds to the case where Transformation 3 applies in The- 
orem 8.5, and the result follows. □ 

Example. Consider yet again the metagraph in Figure 8.9, and its correspond- 
ing A and A* matrices (Figures 8. 10 and 8. 1 1). The metapath {e \ , 62 } connects 
the set {vi , X2} to the set {X3, X4}. If we add the constraint ->X3 V ->X4, then hy 
Theorem 8.6, we get the integrity constraint ->xi V ->X2 (it is worth noting that 
in this case, we also find that hy Transformation 1, the edge can he deleted). 

Thus, we have shown that the A* matrix for a metagraph corresponding to 
a rule base provides a useful basis for not only determining valid, but also for 
detecting implicit integrity constraints. Furthermore, since the set of integrity 
constraints only changes when the rule base is updated, the process of extract- 
ing the constraints can be viewed as part of the compilation of the rule base in 
terms of the A* matrix. 
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4. METAGRAPH MODELS OF DECISION SUPPORT 
SYSTEMS 

In Chapter 7 we found that metagraphs provide a useful way of modeling 
decision models. The elements represented the input and output variables in 
the model, and the edges represented the models themselves, with the model 
inputs as invertices and the model outputs as outvertices. The collection of 
set-to-set mappings that make up a metagraph provide a useful framework for 
model management. 

In this chapter, we have seen that the same is true of data relations and 
rules. In the case of data relations, the elements in the generating set are data 
attributes, and the edges are the relations, with the invertices as key attributes 
and the outvertices as content attributes. In the case of rules, the elements in the 
generating set are propositions and the edges are the rules, with the invertices 
as antecedents and the outvertices as consequents, both in conjunctive form. 
In both cases the collection of set-to-set mappings that make up a metagraph 
provide a useful framework for the management of these two additional types 
of information found in a decision support system. 

We can also see that the concept of a metapath provide a useful framework 
for analysis in these additional areas as well. Metapaths can be used not only 
to identify calculation paths for models, but also to identify access paths for 
data retrieval and inference paths for collections of rules. As with model bases, 
there must be an acyclic metapath for the inferences to be valid. In addition, 
since the same structure (metagraphs and acyclic metapaths) can be used to 
determine inferences in each of these three areas, they can be useful in collec- 
tions of data, models, and rules. 

The key to this is in the algebraic foundation of metagraphs, specifically the 
adjacency matrix and its closure. Their principal contribution is in identifying 
inference paths and in determining whether any such paths are acyclic. In addi- 
tion, when the elements in the generating set are propositions, they can be used 
to help analyze integrity constraints, which include the case the propositions 
may be negated. Thus, metagraphs and their algebraic underpinnings provide a 
powerful framework for the representation and algebraic analysis of the three 
principal types of information found in a DSS - models, data relations, and 
rules. 




Chapter 9 

METAGRAPHS IN WORKFLOW AND 
PROCESS ANALYSIS 

This is the last of the three chapters in which we examine the applications 
of metagraphs to information processing systems. In the previous two chap- 
ters we examined applications to three information structures found in deci- 
sion support systems: data, models, and rules. We now turn to yet another 
topic - workflow systems. Workflow systems integrate the judgmental and de- 
cision making efforts of humans (managers and analysts) with the information 
processing (computational and communication) activities of machines to im- 
plement business processes. 

Business processes - such as order fulfillment, product development, corpo- 
rate budgeting, and interorganizational supply chain management - cut across 
traditional organizational functions - such as purchasing, manufacturing, dis- 
tribution, and marketing. Organizational structures are often best described 
by hierarchical or matrix decomposition of functions. People are typically 
recruited, trained, and advanced within a specialized function, at least un- 
til they are advanced near or to the pinnacle of the organization. On the 
other hand, organizational goals often are accomplished by cross-functional 
business processes, especially those processes that interface the organization 
with its customers. Workflow systems crosswalk functions with processes so 
that organizational structures may be employed to accomplish organizational 
goals. 

Metagraphs provide a useful tool for modeling workflows and their un- 
derlying processes. The elements in the generating set are the objects being 
processed by the workflow. Often these are documents, such as loan appli- 
cations, credit reports, property data, and risk analysis reports. We will not 
focus here on the detailed content of these reports but rather on their flow 
among tasks, each of which transforms one set of documents into another set 
of documents. These tasks may include risk assessment, property appraisal, 
and approval/rejection of a proposal. The tasks are represented by edges in a 
metagraph. 

In this chapter we will address five issues relevant to the application of 
metagraphs to workflow and process analysis. The first is the use of meta- 
graphs in representing workflows and processes. A process is represented by 
a conditional metagraph in which the propositions have not been evaluated 
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(i.e., assigned a TRUE or FALSE designation). For example, a proposition as- 
sociated with an edge may be an assumption that the task represented by the 
edge exceeds a minimum dollar amount. In addition resources (e.g., special- 
ized people or equipment) may be associated with tasks, and a proposition may 
be that sufficient resources are available. A workflow is an instantiation of the 
processes - that is, a process in which the propositions have been evaluated. 

The second issue is the ways in which views of a metagraph - that is, pro- 
jections of workflows - may be used to identify information interactions in the 
workflow. These include task interactions, in which one task produces an infor- 
mation element (in its outvertex) that is used in another task (in its invertex), 
and resource interactions. Both task and resource interactions can be repre- 
sented by metagraphs as well. We will introduce two new metagraph views, 
the Task Interaction Metagraph and the Resource Interaction Metagraph, to 
represent these two views of a workflow. 

The third issue concerns the synthesis of processes, which is accomplished 
by taking the union of the metagraphs representing the processes. An interest- 
ing question is whether full connectivity is preserved. If it is not, then there 
is at least one instantiation (i.e., a workflow) of the synthesized process that 
cannot be completed. We are also concerned with redundancy and full connec- 
tivity in the synthesized process. We will see that an important issue here is 
whether the process contains one or more cycles. 

The fourth issue concerns process decomposition and its implications for 
organizational design. In this case an important consideration is independence 
of decomposed subprocesses and the resources used in these subprocesses (as 
specified in the Resource Independence Metagraph). The organizational de- 
sign issues result from the dependence or independence of the submetagraph 
representing the subprocess and the resources used in the subprocess. 

The fifth issue differs from the first four in that it concerns quantitative 
rather than qualitative attributes. We examine the scheduling of time-critical 
workflows. The tasks in a workflow are attributed with task durations, result- 
ing in the metagraph analogue of a PERT/CPM network. However, the result- 
ing critical path calculations are richer and more complex than has been the 
case with the directed graphs used in PERT/CPM project networks. 

We will examine these five issues in the five sections below. 

1. REPRESENTING WORKFLOWS AND 
PROCESSES WITH METAGRAPHS 

Most organizations are organized by function, such as purchasing, manu- 
facturing, marketing, engineering, and accounting. Human, physical, and fi- 
nancial resources are often managed by function, and most coordination takes 
place within functions rather than between functions. Because of this, these 




Metagraphs in Workflow and Process Analysis 



119 



functions are often called “stovepipes”, stressing the fact that most commu- 
nication and coordination takes place vertically (up and down within each 
stovepipe), rather than horizontally (across stovepipes). But many organiza- 
tions are finding that they must also manage processes that cross functional 
(or stovepipe) boundaries, such as order fulfillment and new product introduc- 
tion. These processes not only span the organization’s separate functional units 
hut also integrate the organization with other organizations - for example, in 
supply chain management. 

The problem is that although organizations are managed by function, 
processes are the entities that deliver value to the customer (Davenport, 1993; 
Marschak, 1995; Thomson 1995). In addition, some of these processes are be- 
coming quite complex, in terms of their size and the structure of both the tasks 
and information flows that make up the processes. To some degree this size 
and complexity can be addressed intuitively - that is, by thinking long and 
hard about the processes. But many people are turning to the development of 
formal models of business processes (Barua, Lee and Whinston, 1996). 

Another concept similar to that of a process is that of a workflow. The dif- 
ference is that a process may contain Boolean information elements - for 
example, whether a loan request exceeds a certain amount. A workflow is an 
instantiation of a process for a set of particular values of the Boolean informa- 
tion elements. In this case the process could result in either of two workflows, 
one in which the loan request exceeds the target amount and one in which it 
does not. Thinking ahead for a moment, we might anticipate that processes 
will be modeled as conditional metagraphs, and workflows will be represented 
by (unconditional) metagraphs that are obtained by assigning Boolean truth 
values to the assumptions in the conditional metagraphs. 

Who will be involved in the management of processes and their resulting 
workflows? As it turns out, there will be several such persons (Hammer and 
Champy 1993; Khoshafian and Buckiewicz 1995; Hammer 1996), and these 
different people will need to understand different aspects of these processes 
and workflows. Senior executives will need to understand what information el- 
ements are required and produced by a process and what resources are needed. 
Process managers will need to understand how the tasks in each workflow in- 
teract with each other through the information elements they use and produce. 
In other words, they must understand the flow of information elements through 
a process and therefore through its resulting workflows. Information technol- 
ogy managers will need to understand how information elements, tasks, and 
resources interact, so that effective operational and decision support systems 
can be designed. This suggests that we model processes in a way that facilitates 
identification and analysis of their associated workflows. 
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Processes and their workflows can be modeled in several ways using differ- 
ent tools. Four perspectives commonly used in process representation are as 
follows (Curtis, Kellner and Over 1992; Kwan and Balasubramanian 1997): 

1. Informational modeling focuses on the informational entities involved 
in the process, the structure of these entities and their interrelationships. 
Thus, informational modeling is concerned with the pure inputs to a 
process, the intermediate information elements in the process, and the 
information elements that make up the outputs of a process. 

2. Functional modeling focuses on what tasks are being performed and 
what informational elements are involved in these tasks. Thus, functional 
modeling is concerned with the relationships among the various tasks 
in a process as determined by information elements that are outputs of 
some tasks and inputs of other tasks. 

3. Organizational modeling focuses on the agents/resources that will be in- 
volved in each task, where information entities are to be stored, and the 
communication needed between agents/resources. Thus, organizational 
modeling is concerned with the people, hardware, and software needed 
for a task to participate in a process. We note that people may be indi- 
viduals or categories of individuals (e.g., systems programmers). This 
distinction between individuals may also apply to hardware, software, 
and any other agents/resources. 

4. Transactional modeling (also called behavioral modeling (Curtis, Kell- 
ner and Over, 1992)) examines issues of timing (sequencing) and con- 
trol, both within and between the tasks involved in the process. Thus, 
transactional modeling is concerned with the order in which the tasks 
are to be performed (some serially and possibly some in parallel), con- 
ditioned on the assumptions associated with the tasks and the informa- 
tional outputs of previous tasks. 

Traditionally, these perspectives have been implemented separately. How- 
ever, we suggest that metagraphs provide a useful and comprehensive foun- 
dation for modeling and integrating these perspectives. That is, a significant 
contribution of metagraphs is that they integrate the informational, functional, 
and organizational perspectives within a single model. This allows not only 
graphical visualization of processes, but also their formal analysis, where the 
analysis will be accomplished by means of an algebraic representation of the 
graphical structure. In the metagraph view informational entities will corre- 
spond to the elements in a generating set, and tasks will correspond to the 
edges in the metagraph. This construct extends the features offered by tradi- 
tional graph structures such as digraphs and hypergraphs and allows us to ad- 
dress questions such as the following using algebraic operations on metagraph 
representations of processes: 
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1 . How do information elements relate to each other through the tasks that 
use and produce them - for example, which information elements are 
needed to determine other information elements, and which information 
elements are intermediate elements in a process that calculates other el- 
ements {informational modeling)! 

2. How do tasks relate to each other through the information elements that 
they use and produce - for example, if a task is disabled, what other tasks 
cannot he executed {functional modeling)! 

3. How do the resources needed to perform various tasks relate to each 
other through the tasks that use them and the information used and pro- 
duced hy these tasks - for example, what information passes from one 
resource to another and if a resource is unavailable, what other resources 
are affected {organizational modeling)! 

In addition to the questions pertaining to each perspective, there are other 
questions that span several perspectives. For instance, if a particular resource 
were to become unavailable, then several tasks might be disabled. How would 
the workflows in the process be affected? A significant aspect of the integra- 
tion enabled in our approach is that such questions can also be addressed in 
a structured manner. In addition, we will address certain transactional issues 
of timing and scheduling here, task duration and temporal constraints as addi- 
tional metagraph edge attributes. The result will be a metagraph analogue of 
the directed graphs used in PERT/CPM analyses. 

Essentially, we need a theoretical framework for the representation, analy- 
sis, and manipulation of workflow systems. Metagraphs allow different com- 
ponents of processes to be represented both graphically and analytically. This 
framework also slows us to analyze both connectivity and component interac- 
tion of workflows using a single representational construct. 

We illustrate this framework with the example of a loan evaluation process. 
The input to the process is a document containing certain information items de- 
scribing the applicant and certain characteristics of the loan being requested. 
Various tasks are used to analyze the application, and possibly to request that 
additional information be made available, and then to arrive at a decision. Hu- 
man and computer resources, such as loan officers, loan managers, fax ma- 
chines, and worksfafions are used fo accomplish fhese fasks. If is fhen neces- 
sary fo defermine fhe flow of informalion items, fhe scheduling of fasks, and 
fhe allocafion of resources. These are offen specified in procedures manuals 
and/or in aufomafed workflow systems. 

We now define fhe cenfral concepfs of workflow sysfems and presenf some 
of fhe questions perfinenf fo workflow analysis. We begin wifh fhe following 
terms: 
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1. An information element is an atomic data item (e.g., a number, a charac- 
ter string, an image, or an icon) or a collection of atomic data items (as 
in a document). 

2. A report is a collection of information elements. 

3. A task is an ordered pair of reports, the first of which is an input to the 
task and the second of which is its output. A task is executed when the 
inputs are used to determine the output. 

4. A workflow system is a set of information elements and a set of tasks, 
such that the inputs and outputs of the tasks are all in the set of informa- 
tion elements. 

5. An assumption is a proposition (which may be true or false) associated 
with a task, such that the assumption must be true for the task to be ex- 
ecuted. For example, it may be assumed that the dollar value of a trans- 
action is less than a certain amount. 

6. A resource is an entity associated with one or more tasks, and the re- 
source must be available if the tasks are to be executed. Several resources 
may be associated with a single task, and vice versa. Resources may be 
people, workstations, categories of people (i.e., roles - such as program- 
mer or file clerk), efc. 

7. A process is a set of tasks that connects one set of information elements, 
called the source, to another set of information elements, called the tar- 
get. All of the inputs for any task in the process must be either in the 
source or in the output of some other task(s) in the process. 

8. A workflow is a particular instantiation of a process. Since a process may 
include decision points that can cause the process to branch in different 
ways during execution, a process can be instantiated into several possible 
workflows, each one corresponding to a particular set of values for all 
relevant branching conditions. 

The purpose of constructing a framework for workflow management is that 
it allows us to formulate questions about the relationships among the three 
important components of workflows: information, tasks, and resources. Nine 
such questions are shown in Table 9.1, and are answered in the following sec- 
tions of this chapter. 

In order to answer these questions we need an analytical framework that 
allows us to: (1) capture all of the important elements in the workflow process, 
and (2) address these questions by means of rigorous analytical procedures 
rather than visual inspection and intuition. The theory of metagraphs provides 
a basis for such a framework. In the next section, we summarize the major 
features of metagraphs and present some extensions to the theory that are per- 
tinent to workflow analysis. Then in the following section, we will use these 
features to address questions like those listed above, for workflow manage- 
ment. 
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Table 9.1. Relevant questions about process components during process analysis 



Process component Questions 



Information elements 

1. Given two information elements, is one of them needed to de- 
termine the value of the other? Is it needed only under certain 
conditions and if so, what are the conditions? 

2. Given two sets of information elements, is it possible to de- 
termine the value of the second set from the elements in the 
first set? If not, are there any additional information elements 
that would make it possible to do so? 

3. Given a complex process, are there any ways to focus on 
only important information elements, hiding intermediate el- 
ements that are needed only to calculate the important ones? 



Tasks 4. Given a task that we wish to execute, what other tasks must 

be executed in order to provide the information needed to do 
so? 

5. If a task is disabled, what other tasks will be affected - that 
is, what other tasks cannot be executed? 



Resources 6. Given a set of resources, what information passes among 

them as the tasks that utilize them are executed? 

7. If a resource is unavailable, what other resources are affected 
- that is, what other resources will be idle because their tasks 
cannot be executed? 



Interactions among 8. If a resource used in a process is unavailable, which work- 
components flows within the process can still be completed? 

9. If an information element is found to be inaccurate, which 
resources were used, directly or indirectly, in the calculation 
of that element? 



2. VIEWS OF WORKFLOWS 

Each information element in a workflow can be represented as an element 
of the generating set X, or more specifically, of (i.e., the variables in the 
workflows). There are other elements in the generating set, specifically Xp (the 
propositions that denote assumptions) and Xr (statements as to the availability 
of resources), but they are not information elements in the workflow. A col- 
lection of information elements comprising a report can then be represented 
as a vertex, either an invertex or an outvertex. This presumes that each report 
is either the input or output of some task, which is reasonable for a “black 
box” analysis of interactions among tasks, as discussed below. Each task is 
itself represented as an edge in the metagraph. We assume that the input to 
each task (invertex) is a report, as is its output (outvertex). This assumption is 
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reasonable, since the report comprising each task’s input can be composed of 
elements from one or more reports (and/or manual inputs from some resource). 

It follows, then, that each process can be represented by a metagraph. More 
generally, a metagraph can be used to represent the tasks comprising a col- 
lection of possibly related (or overlapping) workflows comprising the process. 
For example, the risk exposure of a bank, which is determined by a series of 
steps including both computations of internal data such as outstanding loans, 
as well as market conditions and other external information, may be used both 
in the loan evaluation process as well as in planning the bank’s insurance cov- 
erage. This last point is significant, since most other graph constructs do not 
allow such overlapped representation. The collective representation of multi- 
ple workflows in a single metagraph enables analysis and possible redesign of 
these workflows in a more comprehensive manner. 

Metagraphs, like other modeling techniques based on graph theory, provide 
a black box representation of reality. That is, a task, as represented by a meta- 
graph edge, is viewed as a pair of inputs and outputs. We are not concerned 
with happenings inside the black box. Thus, representation of a task by means 
of a simple metagraph (i.e., one without any additional attributes on vertices or 
edges) does not take into consideration its duration, the quality of work done, 
the value added by the task to any process, and the type of monitoring and con- 
trol needed to detect errors in the task. However, some of this can be included 
in the representation by attaching attributes to the edges. We will see an exam- 
ple of this in Section 5 of this chapter, where we will include consideration of 
time, and specifically task duration, in a metagraph. 

Although metagraphs are limited in this regard, they do have a powerful 
advantage. First, metagraphs model the essential structure of a workflow sys- 
tem, in that they allow for an explicit representation of the components of 
the system and the interactions among them. Thus, they can be used to deter- 
mine what sorts of information can be furnished by a workflow system, given 
the information processing capabilities of its components. Second, as we will 
see, metagraphs allow for multiple views of a workflow system - an element- 
centric view, a task-centric view, and a resource-centric view. Since each of 
these views are themselves metagraphs, the same mathematical machinery can 
be used for all of them. Third, it is possible that metagraphs can be extended to 
include some of the contents of the black box mentioned above - for example, 
by labeling the edges with estimates of costs, error rates, etc. However, such 
an extension is beyond the scope of this chapter and this book. 

To complete the representation, a workflow can be represented as a meta- 
path from a set of information elements comprising a source to another set 
comprising the target. Assumptions underlying each task can also be repre- 
sented in the metagraph, by augmenting the generating set with a set Xp of 
propositions and including the relevant propositions in the invertices of the 
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task edges. And finally, resources needed for each task can also be represented, 
by further augmenting the generating set with a set Xr of resources. Then, the 
resources required in each task can be represented as additional inputs to the 
corresponding edge. Note that the separation of the generating set into the three 
component sets , Xp , Xr is not done merely for convenience. The primary 
motivation for this separation is that the evaluation of elements from each set 
is different. While information elements can have any value from their par- 
ticular domain, propositions evaluate to either “true” or “false” (with a task 
becoming viable only if all its assumptions evaluate to “true”) and resources 
evaluating to either “available” or “unavailable” (with a task becoming viable 
only if all its resources are available). From a visualization perspective, the as- 
sumptions underlying a task and the resources it needs can be presented to the 
user as labels on the edge itself, rather than as invertex assumptions. However, 
the invertex representation may be more intuitive. 

Consider the example illustrated in Figure 9.1, illustrating a workflow 
process that determines whether an application for a property loan is to be 
accepted or rejected. The workflow process is modeled as a conditional meta- 
graph, with the incidence matrix shown in Figure 9.2. The information el- 
ements in the workflow process, represented by elements in the conditional 
metagraph, are as follows: 

• AC: account data relevant to the applicant; 

• APD: data about the applicant contained in the application; 

• CH: credit history of the applicant; 

• PD: data about the property for which the loan is being sought; 

• CD: data about comparable properties; 

• CR: applicant’s credit rating; 

• AV: the appraised value of the property; 

• LA: the amount of the loan; 

• BP: the current bank portfolio of loans; 

• LR: the level of risk associated with the loan; 

• RE: the bank’s current risk exposure; 

• YES: a statement that the application is approved; 

• BR: a statement that the loan being applied for is a bad risk; 

• NO: a statement that the application is rejected. 

The tasks in the workflow depend not only on the information elements 
identified above, but also on the following assumptions: 

• AR: whether the level of risk is acceptable; 

• MR: whether the loan application represents a marginally bad risk; 

• There are eleven tasks in the workflow process, as follows: 

ei : the branch manager uses account data and applicant data to calculate 
the applicant’s credit rating; 
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Figure 9 . 1 . A loan process metagraph. Edges (tasks): e\ - credit rating process; 62 ~ alt- CR process; 63 - property appraisal; 64 - risk assessment; 
^5 - loan amt. reduction; eg - risk exposure assessment; ey - acceptable risk assessment; eg - marginal risk assessment; eg - loan approval; ejo - bad 
risk assessment; ej j - loan rejection. Resources: I - loan officer; a - appraiser; r - risk analyst; b - branch manager; i- - system. 
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Figure 9.2. The incidence matrix for the loan evaluation metagraph in Figure 9.1. 



C2'- the loan officer uses applicant data and the applicant’s credit history 
to calculate the applicant’s credit rating; 
ey : the property appraiser uses data about the property along with data 
about comparable properties to calculate the appraised value of the 
property; 

64: the loan officer uses the applicant’s credit rating, the appraised value 
of the property, and the loan amount to calculate the level of risk 
associated with the loan; 

65: if the risk of the loan is determined to be a bad risk, the branch 
manager uses the appraised value of the property and the level of 
risk associated with the loan to calculate a new loan amount; 
e(,: the risk analyst and the property appraiser use the appraised value 
of the property, the loan amount, and the current bank portfolio of 
loans to calculate the bank’s current risk exposure; 
e? : the system examines the risk associated with the loan and performs 
the calculations needed to determine whether the risk is acceptable; 
eg : the system examines the risk associated with the loan and performs 
the calculations needed to determine whether the risk is a marginally 
bad risk; 
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eg: if the level of risk is acceptable, the loan officer uses the risk asso- 
ciated with the loan and the bank’s risk exposure to decide whether 
to approve the loan; 

eio: the system examines the risk associated with the loan and performs 
the calculations needed to determine whether the whether the loan 
application represents a bad risk; 

eii : if the loan application represents a bad risk, the loan officer rejects 
the application. 

Finally, there are five resources used in the process, and these are shown 
in parentheses on the edge label (although semantically, they are treated as 
additional invertex elements), as follows: 

• a : a property appraiser; 

• b\ the branch manager; 

• / : a loan officer; 

• r: a risk analyst; 

• 5 : an automated system. 

3. ANALYSIS OF INFORMATION INTERACTIONS 

We now show how the properties of metagraphs can be applied to the analy- 
sis of interactions among information elements. The analysis is operational- 
ized through the use of the A and A* matrices. 

First, the role of each information element x can be analyzed by examin- 
ing the row and column corresponding to x in the matrices. For instance, each 
triple in the row corresponding to x in A* represents a simple path from x to 
some element, and identifies the coinputs and cooutputs of the path. Similarly, 
each triple in the column corresponding to x represents a path from some ele- 
ment to X. We can also perform a number of analyses using this information, 
such as the identification of necessary coinputs or tasks between any pair of 
elements x and y (coinputs and edges that appear in each triple in the corre- 
sponding cell in the A* matrix), and the identification of any cycles through 
X (identified by af- ^ 0). For our example in Figure 9 . 1 , we may want to 
know if it is necessary to have the account data (AC) in order to determine the 
bank’s risk exposure (RE). The visual representation may suggest that there is 
no connection between these elements. However, the cell in the A* matrix cor- 
responding to the row for account data (AC) and the column for risk exposure 
(RE), has the value uac.re = {{{AV},{CR,LA,LR,MR},{ei,e4,es,e5,e(,))}, 
indicating that if the appraisal value (AV) as well as the account data is known, 
then the risk exposure can be computed using the tasks corresponding to the 
list (61,64,6^,65,6^), which would also yield the applicant’s credit rating 
(CR), the loan amount (LA), the loan risk (LR) and whether the case involves 
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marginally bad risk (MR). (We do not show the entire matrix here, only rele- 
vant cells are described as needed.) This shows that indeed AC does affect RE. 
Examination of the cooutput also reveals that MR is one of the intermediate 
outputs. This indicates that the dependence of RE on AC occurs only when the 
case is evaluated as marginally risky. Such analysis can facilitate scheduling 
tasks in workflows, as well as resource allocation. 

Since a process is composed of a set of tasks that connect a source set of 
information elements to a target set of information elements, each workflow 
can be represented by a metapath from the underlying process’s source to its 
target. This implies that a variety of metapath analysis mechanisms can be 
applied to workflow analysis. For example, given a source and target, the A* 
matrix can be used to identify the possible workflows for the process. In other 
words, a metapath search can help us to determine whether a process is func- 
tionally complete or not. That is, if there is no metapath available from the 
process source to the process target, then additional tasks will have to be in- 
cluded, or else some or all of the component tasks will have to be redesigned. 
For instance, we might assume that given all the information in a completed 
loan application, such as account data (AC), applicant data (APD), credit his- 
tory (CH) and property data (PD), the loan process could be completed (i.e., 
the values of the loan amount (LA) and YES could be computed). However, 
if we try to construct a metapath from {AC, APD, CH, PD} to {LA, YES], we 
would fail. Based on visual inspection, we might add the data on comparable 
properties (CD), and try again. However, even then, no metapath is found. The 
element BP, representing information about the bank’s existing loan portfolio, 
persists as a coinput. This analysis indicates the need for BP as an essential 
input for successful loan evaluation. On the other hand, there is a metapath 
from the application data {AC, APD, CH, PD, CD] to {A^O}, indicating that a 
complete workflow for unacceptable cases can be completed without informa- 
tion about BP. Again, this conclusion can be useful in designing the process 
(e.g., BP is acquired only after the loan data leads to instantiation of accept- 
able risk (AR)). Furthermore, if there are multiple workflows possible for a 
given source and target, the concept of a dominant metapath can be applied 
to facilitate choice among them. Conversely, given a specific workflow, the 
same property can be used to determine whether this workflow is efficient 
(i.e., whether it corresponds to a dominant metapath), or whether some alter- 
native workflow may be preferable. Again, in the loan process, the loan amount 
adjustment task is superfluous, unless the application reflects marginally bad 
risk (i.e., MR is evaluated as True and AR is not). Additional analyses include 
identification of source-dominant metapaths for a given target set (i.e., meta- 
paths that require a minimal source set for the given target set, even if each 
such metapath includes additional edges). The role of the applicant’s credit 
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history {CH) in the workflow (metapath {e\, 62, e^, 64, e(,, ei, eg}) to loan ac- 
ceptance (YES) exemplifies this case. Initially, we might assume that the source 
for this workflow is {AC,APD, BP, CPI, CD, LA, PD}, hut then discover that 
this metapath is dominated hy the metapath {ei, ej, 64, e^, ej, eg}, with source 
{AC,APD, BP, CD, LA, PD}, which does not include CH. 

The projection operation on a metagraph is another useful construct for an- 
alyzing workflows. By focusing on a specific subset of information elements 
X'^ C Xy it displays the relationships among these elements by identifying 
the processes that relate these elements. From a visualization standpoint, the 
projection view of a workflow metagraph is valuable, since it focuses atten- 
tion on a few important elements and tasks. At the same time, the analytical 
structure underlying the view, in terms of the composition of each projected 
edge, enables identification of the structure of specific tasks making up the 
projection. For instance, a projection over appraised value (AV), bank portfo- 
lio (BP), loan risk (LR) and YES would show an edge from {AV,BP,LR} to 
{T£ 5 } (propositions such as AR and MR can be hidden in projections, to sim- 
plify visualization), which illustrates that under some conditions, the factors 
determining loan acceptance are the appraised value of the property, the loan 
risk and the bank’s current loan portfolio. This is difficult to ascertain visually 
from the detailed metagraph, yet is obvious in the projection. 

Another abstraction that proves valuable in workflow analysis is a con- 
text. Recall that the assumptions applicable to each task in a workflow can 
be included as propositions in the metagraph representation. Given a set T of 
known true propositions and a set E of known false propositions, the applica- 
ble workflows under these conditions can be identified by constructing the 
corresponding context of the metagraph. The context metagraph can then be 
analyzed in all the ways described above. As with the projection operation, the 
context metagraph provides a means for simplifying complex workflows and 
focusing attention to relevant tasks and elements. Contexts also help process 
designers avoid unpleasant situations. For example, given a process defined by 
a specific source and target, verifying functional completeness of the process 
in various relevant contexts (by ensuring the existence of relevant metapaths in 
each context) can help avoid nasty surprises such as the process failing under 
certain conditions. In our example, we can construct contexts for the different 
levels of loan risk represented by the propositions AR, MR and BR, and exam- 
ine the three alternative workflows that result for the loan evaluation process. 

Thus far, we have answered questions 1 , 2 , and 3 posed in Table 9 . 1 . In 
addition, other connectivity properties, such as cycles and bridges, can be ana- 
lyzed using our algebraic approach. For example, the cycle through the edges 
[e4,es} can be identified using the diagonal elements of the A* matrix, and 
the fact that the edge ^4 is a bridge from the application data to either loan 
outcome can be algorithmically determined. Thus, any software for workflow 
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analysis based on metagraphs would have a user-friendly GUI for purposes of 
visualization, but its operation would use structured procedures based on alge- 
braic representations of metagraphs which could be used to answer questions 
such as those we have discussed in this section. 

4. ANALYSIS OF TASK INTERACTIONS 

We now turn to the analysis of tasks in workflows. Recall that tasks are 
represented in the metagraph as edges, and they appear as the third component 
of each triple in the adjacency matrix of the metagraph. 

In workflow analysis, a number of questions about tasks and their role can 
arise, as exemplified by the questions listed at the beginning of this chapter. For 
such analysis, it is useful, both from visualization and analytical viewpoints, 
to have a task-centric view of the workflow system, as opposed to the element- 
centric view considered so far. Put another way, it would be useful to have 
tasks as elements of the generating set, and edges linking sets of related tasks. 
This can be achieved using a simplified version of the inverse metagraph. In 
the context of workflow metagraphs, we call the resulting construct the task 
interaction metagraph (TIM) for the workflow system. 

To construct the task interaction metagraph using Procedure Inverse in 
Chapter 4, we modify the procedure as follows: 

1. Step 3 of the procedure is not used, since we only represent interac- 
tions among tasks in the TIM. Thus, pure inputs and pure outputs are 
excluded. 

2. Edge labels are simplified to specify only the information elements. 

This is illustrated in Figure 9.3 for our example. The TIM has as its ele- 
ments the tasks (edges) of the original metagraph, and each edge represents 
a situation in which one or more tasks (in its invertex) communicates with 
one or more tasks (in its outvertex) by providing information to them. In Fig- 
ure 9.3, the edge from e^ to ey, e%, eio illustrates the dependence of the latter 
three tasks upon e 4 for the value of the loan risk (LR). From the closure of 
the TIM’s adjacency matrix, we can identify the metapath {LR, BRj from e 4 
to eii, which shows that once e 4 is executed, the edges in the metapath can 
be executed without interaction with any other tasks. On the other hand, the 
lack of a metapath from 04 to eg indicates that other tasks besides e 4 have to 
provide information in order to execute eg. However, there is a metapath from 
{ei, ey, es) to {eg}. This shows that the edges in that metapath can be used to 
enable eg, without interaction with any other tasks. Similarly, the cycle through 
the edges e 4 and e$ reveals that these tasks may be executed multiple times in 
workflows (metapaths) in which they both appear. Such analysis, as exempli- 
fied by questions 1 and 2 in Table 9.1, can help in designing workflows, since 




132 



A. Basu and R. W. Blanning 




o\ 

(D 

SP 



c/5 

C/5 



(D 

O 

O 

o- 



-a 

c3 






3 














Metagraphs in Workflow and Process Analysis 



133 



coordination of tasks is only necessary when there are dependencies among 
them. 

The TIM can also he useful in analyzing the impact of one or more tasks 
failing during a process. Another related question is whether a particular task 
is essential for a workflow. Since a task could produce multiple outputs, each 
of which is used in possibly several other tasks, such questions cannot always 
he answered simply through the visual representation of either the process 
metagraph or its TIM. However, hy testing whether a given workflow meta- 
path is dominant or not from the closure matrices for either representation, 
these questions can he answered in a structured manner. Further, if we want to 
know whether a particular task ept in a workflow {ep\ , Cp 2 , . . . , Cp„} would he 
disabled by the failure of another task ej, the closure of the TIM’s adjacency 
matrix can be utilized. If all the paths in the column corresponding to ej and 
rows {cpi, Cp 2 , . . . , Cpn) contain ej, then indeed ej is essential for gp,. For ex- 
ample, in the TIM in Figure 9.3, every path from {ei , ^ 3 , ^ 5 } to eg contains 64 , 
which is thus essential for the workflow from {e \ , e-y, eg} to {eg}. 

We have now answered questions 4 and 5 posed in Table 9.1. The strength 
of metagraph-based task analysis in workflow modeling is that we can per- 
form both visual inspection and algebraic analysis of workflow tasks using the 
same operations and methods as those we used earlier for analysis of infor- 
mation elements and their interactions. In other words, the approach integrates 
functional and informational analysis of a system (Curtis, Kellner and Over, 
1992), which is difficult using traditional tools for process analysis. 

5. ANALYSIS OF RESOURCE INTERACTIONS 

We now turn to the analysis of the role of each resource in the workflow, 
and interactions between different resources. A resource may be human (a per- 
son, group, team or task force), or equipment (e.g., computers, software pack- 
ages or programs, and databases). Furthermore, a resource may be a partic- 
ular entity (e.g., a specific person) or a class (e.g., a functional role, such as 
a financial analyst). There is a many-to-many mapping between tasks and re- 
sources - each task may require several resources, and each resource may be 
required in several tasks. 

An important part of business process design is the effective allocation of re- 
sources to tasks. There are two dimensions to this allocation problem. The first 
is the functional interaction among different resources, the tasks they perform 
and the information elements they use and produce. These considerations and 
their analysis impact the design of processes and their workflows. The second 
dimension is the temporal constraints applicable to the interactions that impact 
the operational control of workflows during their execution. This is important 
for monitoring and control of workflow execution (at run time), in addition to 
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Figure 9.4. The R matrix for the loan evaluation process. 

workflow design. The approach presented in this chapter addresses only the 
first dimension, since we do not include temporal attrihutes for tasks (although 
our approach can he extended towards this end with attributed metagraphs). 
Thus, our focus is on understanding how different resources interact with each 
other, the tasks through which these interactions occur, and the information 
that they exchange. 

Interactions among resources can he specified hy a resource interaction 
metagraph (RIM), which shows where resources provide information to each 
other through a sequence of two successive tasks. This is accomplished hy 
using the element flow metagraph. If the element set X' for the element flow 
metagraph is restricted to the set of resources, then the result is in fact the RIM. 
For our example, the G\ and G 2 matrices for Procedure EFM correspond to 
the sets of rows in Figure 9.2 denoted as G\ and G 2 . The R matrix that results 
from the first step is as in Figure 9.4, and the result of the procedure is the RIM 
illustrated in Figure 9.5. 

Using this metagraph we can determine which resources depend on other 
resources to provide information. That is, we can identify cases in which a 
resource is used to perform a task that provides information to another which 
uses the second resource. For example, for the loan evaluation process, the 
hranch manager interacts with the loan officer hy providing the latter with 
the applicant’s credit rating for the risk assessment task; the loan officer in 
turn performs the risk assessment and returns the risk value of the loan to 
the hranch manager. In addition, when the hranch manager performs the loan 
amount reduction task, she also provides a loan amount to the loan officer for 
fhe loan risk assessment. This interaction can he easily visualized from the 
RIM, hut is difficult to visualize from the original metagraph. 
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Figure 9.5. Resource interaction metagraph for loan process. 

Another application of the RIM is to analyze the impact of resource failure. 
For instance, we may want to examine the impact of a failure in the automated 
system (5). From Figure 9.5, it is clear that this directly impacts only the loan 
officer and the branch manager, who directly interact with the system. In other 
words, the tasks performed hy the appraiser and the risk analyst can still he 
performed. However, since at some point, there may he indirect dependencies 
between the system and these resources, it is important to identify the indi- 
rect dependencies as well. Since the RIM is a metagraph, it has an adjacency 
matrix and its closure, and these can be used to identify the relevant depen- 
dencies. Thus, if the row in the closure matrix corresponding to the system has 
any entries in the columns for the appraiser and the risk analyst, then there is a 
dependency, and at some point, these resources will be impacted by a failure in 
the system. This is indeed the case, as indicated for instance by the path from 
s to {a,r} through b (the information flows MR{e%, e$),LA{es, e^)). There 
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are other paths too, such as the path through I and b (the information flows 
AR{e-j, e>)),BR{eio, eu)), and another involving the loop through 1. Once these 
dependencies are identified, their roles can he analyzed using either the origi- 
nal metagraph or the RIM. 

In this section, we have presented a third view of workflow metagraphs, the 
resource interaction metagraph. This view (i.e., the RIM), the resource task as- 
signment matrix, and the transformation operation used to generate it from the 
original workflow metagraph are new metagraph constructs. Using this view, 
we can answer a variety of important questions about the role of resources in 
the workflow and its underlying process, including questions 6 and 7 posed in 
Table 9.1. As with the analysis of tasks discussed in Section 3 above, while 
the original workflow metagraph can be used to address such questions, the 
RIM significantly enhances both visualization as well as analysis of resource 
interactions, which in turn can lead to better workflows and process design. 

6. INTERACTIONS AMONG DIFFERENT TYPES OF 
COMPONENTS 

The previous subsections focused on questions about the interactions 
among components of the same type - that is, among information elements, 
tasks and resources respectively. We now examine an important additional 
dimension, namely questions that span different component types. 

For instance, we may want to determine the effect of a particular resource’s 
unavailability upon the feasibility of one or more workflows. In the loan 
process, if the risk analyst (r) were unavailable, could the loan evaluation 
still be completed? We answer this question by restricting the metagraph in 
Figure 9.5 to the context where r is unavailable. This results in edge es being 
disabled. If we now try to construct metapaths to {T£5} and {AO} respectively 
from {AC, APD, PD, CH, BP}, we find that we can still complete the workflow 
for rejected loans, but not for approved loans. Thus, the risk analyst is essential 
for making positive loan decisions, but not for negative ones. 

On the other hand, if the automated system resource (s) were unavailable, 
then the edges e-j , eg and eg would be disabled, which would disconnect the 
source and target for the process, rendering the process infeasible. This is im- 
portant, since it indicates to the process designer that either the resource s 
should be designed to be highly reliable, or else that the process should be 
enhanced with additional tasks and/or resources that could be used to reduce 
the criticality of the automated system. 

Another question that may arise is - what resources are used to produce a 
given set of information elements? For example, assume that bank manage- 
ment raised a question about the loan risk assessment values. To determine all 
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the resources that contrihuted to the valuation of this element, we could exam- 
ine the process metagraph visually and conclude that the branch manager (b), 
loan officer (/) and appraiser (a) were the relevant resources. However, hy ex- 
amining the cells in the A* matrix for all resource rows i and the column 
LR, we would find that the automated system s is also relevant. Especially 
in complex processes with many tasks, such analysis can help determine ac- 
countahility for tasks and outcomes in a systematic manner. 

We have now answered questions 8 and 9 posed in Table 9.1. 

7. SYNTHESIS OF PROCESSES 

While processes are sometimes designed from scratch, there are also many 
situations where a process has to be constructed from multiple existing 
processes. This is particularly true when processes are redesigned within an 
organization, or when multiple organizations merge or reorganize. Thus, an 
important area of process design is the analysis of processes that are composed 
of two or more component subprocesses. 

In order to appreciate the types of analysis would be relevant to the synthesis 
of a process from multiple components, one must understand the implications 
of such synthesis. Since each process has multiple possible workflows, the 
combination of multiple processes, each with multiple workflows, can lead to 
a combinatorial explosion of possible workflows through the new synthesized 
process. Even if each of the workflows in each of the component processes is 
well-structured, this may not be true of the synthesized workflows. Ensuring 
that every synthesized workflow is well-structured can thus be a daunting task, 
and the use of computer-based analysis tools can be of great value. 

What are the possible problems that could arise when several well- 
structured workflows are combined? To see this, consider the metagraph rep- 
resentation of workflows. As discussed earlier, a process can be represented 
as a metagraph with a single “Start” vertex and a single “End” vertex. If 
the process is well-structured, then there should be exactly one applicable 
workflow in each possible interpretation. This means that there should be ex- 
actly one corresponding metapath in each interpretation as well. This can be 
checked by enumerating the possible interpretations, and in each case, to find 
all applicable metapaths M {Start, End) in the process metagraph. If for some 
interpretation there is no such metapath, then the workflow corresponding to 
that interpretation has to be modified and/or augmented with tasks to achieve 
connectivity. Similarly, if there are multiple metapaths in any interpretation, 
then appropriate tasks from the process have to be modified and/or removed to 
eliminate the multi-determinacy. 

One way to visualize the synthesis of two processes is as the union of their 
metagraph representations, with the start and end nodes being redefined as 
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appropriate. This in effect results in superimposing the identical elements and 
adding the adjacency matrices. An interesting question then is, if each of the 
component processes are still complete, in that the process can complete in 
all feasible situations. In terms of the metagraph representation, this amounts 
to checking whether there is a feasible workflow through the combined meta- 
graph under all relevant interpretations. From Theorem 6.1 of Chapter 6, we 
know that the union of two metagraphs maintains full connectivity if it does 
not introduce any cycles. This is a useful result, because it implies that preser- 
vation of full connectivity can be verified by checking for acyclicity of the 
combined metagraph. This in turn is very easy to do, since the closure of the 
adjacency matrix of an acyclic metagraph would have no elements in the main 
diagonal. 

To illustrate this idea, we can use the example of loan evaluation processes. 
Consider the Loan Risk Process represented by the metagraph 5i illustrated in 
Figure 9.6. 

The pure inputs to this process are AC, AP, AV, and LA, and the process 
has only one pure output, namely LR, while CR is an intermediate element. 
Since there are no propositions, there is only one interpretation, and there is 
only one metapath from the pure inputs to the pure output. Thus, the process 
is fully connected, and also acyclic. 

Now consider the Loan Decision Process represented by the metagraph S 2 
in Figure 9.7. The pure inputs of this process are PD, CB and LR, and the pure 
outputs are LD and LA. There are two propositions, ?RL and ?RH, which are 
related through the constraint R that exactly one of these propositions be true 
and the other be false. Thus, there are two possible workflows. In each case, 
the pure inputs are the same. If ?RL is true, then the pure output is LD, and 
there is a single metapath from the pure inputs to the pure output. If ?RL is 
false (and therefore ?RH is true), then LA is the pure output, and there is a 
single metapath from the pure inputs to the pure output. Again, the process is 
fully connected, and also corresponds to an acyclic metagraph. 

What happens if we combine these two processes? The resulting process 
can be represented by the union of the two metagraphs in Figures 9.6 and 
9.7, and is shown in Figure 9.8. Given that both the component metagraphs 
were fully connected, is this also true of the combined metagraph? Interest- 
ingly, this is not the case. To see this, consider the interpretation where ?RH 
= TRUE & ?RL = FALSE. It turns out that in this interpretation, there is no 
metapath from the “Start” node of Figure 9.8 to its “End” node. 

The reason for this is also interesting, and can be determined by testing 
the combined metagraph for cyclicity. In this simple example, it is easy to see 
that there is indeed a cycle in the metagraph, through the elements LA, LR, 
and ?RH, and this cycle can be identified by non-empty diagonal members 
of the closure matrix A*. Thus, in the above interpretation, the synthesized 
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property data; ?RH - risk high?; ?RL - risk low? 
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Figure 9.8. The combined metagraph 53 = S] U 5'2: r] - risk analyst; r 2 - loan officer; ry - loan clerk. 
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process would fail to complete, even though the component processes would 
each he able to complete. Also, the cause is a cycle that was introduced hy the 
synthesis itself. 

Even this simple example illustrates the potential value of constructing a 
formal analytical tool for testing the effects of process synthesis. Each of the 
steps in this analysis can he automated, and thus a metagraph-hased tool can 
alert process designers to potential problems before they are implemented. At 
the same time, it is important to recognize that the presence of a cycle does not 
necessarily imply a problem with the process. In our example, if we specify 
an initial value for LA and then iterate through the evaluation process until a 
value for LA is reached for which ?RL = TRUE, the process ultimately exits 
the cycle. Eor example, we can use PD to calculate an initial value for LA. 
We then use LA, CR, and AV to calculate LR. If the value of LR is too high 
so that ?RH = TRUE, we reduce LA using ee (for instance, the task ee might 
reduce the loan amount by a fixed percentage, say 20%), and the revised loan 
is evaluated. If ?RH = EALSE (and therefore ?RL = TRUE), then the process 
terminates with a loan approval decision. Otherwise, es is used again to further 
reduce the LA and the cycle repeats (the cycle always terminates since the risk 
of a loan amount near or at zero is implicitly low). 

We have learned from this example that the synthesis of well-structured 
processes can lead to a composite process that is not well-structured because 
the synthesis operation may introduce a cycle into the composite process. On 
the other hand, the introduction of cycles need not always prevent the compos- 
ite process from being well structured. 

Another aspect of well-structuredness of processes is non-redundancy. As 
discussed earlier, a process is non-redundant if it has at most one feasible 
workflow through it in each feasible interpretation. Non-redundancy is a de- 
sirable feature, since it ensures that the process is both deterministic and pre- 
dictable. In the metagraph representation, this amounts to ensuring that there 
is at most one metapath from the start to the end node of a process metagraph 
in each context, which can be verified using structured procedures for finding 
metapaths. 

In the context of process synthesis, an interesting question is whether the 
synthesis of two non-redundant metagraphs also non-redundant. It turns out 
that this is always true when the component metagraphs have pairwise disjoint 
outvertices. Considering again the example represented in Eigures 9. 6-9. 8, 
we see that indeed, in this case this property is satisfied for the two com- 
ponent metagraphs, implying therefore that the combined metagraph is also 
non-redundant. 
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8. DECOMPOSITION OF PROCESSES AND 
IMPLICATIONS FOR ORGANIZATIONAL 
DESIGN 

Once we have created a synthesized (or aggregated) workflow, using the 
approach of the previous section, we may wish to decompose this aggregate 
hy removing some of the activities and treating them as separate workflows. 
For example, we may wish to create a separate management structure for these 
workflows or to outsource them. There are many managerial and economic 
reasons for such a decomposition and/or outsourcing, hut they are heyond the 
scope of this hook. However, regardless of the motivation, it is important to 
ensure that any such changes do not disrupt the overall process(es), and we 
can address this problem using the property of suh-metagraph independence. 
The result of the decomposition would he that the remaining metagraph would 
he smaller, and potentially simpler in structure, which in turn would result in 
a process that is simpler and easier to manage. 

Note that it may he useful to extract either a single workflow in some cases, 
or an entire process in other cases, from a larger containing process. For ex- 
ample, we may wish to outsource risk assessment of a loan in a particular 
situation (e.g., when the customer does not have an account at the hank); al- 
ternatively, it may he appropriate to outsource the risk assessment process as 
a whole. That is, we may decide to outsource the union of all of the work- 
flows making up the risk assessment process. We assume that a workflow can 
safely (i.e., without disrupting any other workflow) he extracted from a con- 
taining process if the metagraph representing the workflow is independent of 
the containing metagraph. Then it follows that in order to extract an entire 
(suh)-process, we would need to consider whether we can extract the union of 
all of its component workflows. 

There are two issues here. The first is whether the union of two indepen- 
dent workflows - that is, two separate workflows that are each independent 
of the entire aggregate of workflows - will also he independent of the entire 
aggregate. If that were not the case, then it would he necessary to consider 
each workflow in turn, and extract it only if it were independent of the current 
aggregate. 

With regard to independence of the union of two independent workflows, 
we have seen in Chapter 6 (Theorem 6.2) that independence is preserved in this 
case. In other words, bundling (merging) two independent sub-processes for 
subsequent decomposition will result in an aggregate independent subprocess. 
In addition, according to Theorem 6.3 the subprocesses will be independent 
of each other. We note, however, that redundancy is not necessarily preserved 
under these conditions. In other words, each workflow might contain proce- 
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Table 9.2. Independence of submetagraph and RIM 
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dures for non-redundantly calculating the same information element, but the 
resulting union could be redundant. 

The second issue is the intersection of two workflows. We may wish to 
create a new and smaller workflow by pulling out the tasks common to two 
workflows. If the two workflows are independent, the question is whether in- 
dependence will be preserved in the intersection. We have seen in Theorem 6.4 
that one can create a new workflow by taking all activities common to two in- 
dependent workflows and decompose the resulting set of common activities. 
In this case the independence property will be preserved. However, as we have 
also seen, the property of full connectivity will not necessarily be preserved. 

The concept of independence also provides guidelines for process decom- 
position. Consider a process that is a candidate for decomposition. There are 
two issues here. The first is whether a workflow, as represented by a submeta- 
graph, is independent of the aggregate process - that is, whether it is an ISMG. 
The second is whether the collection of resources for this submetagraph, in 
the form of the corresponding resource interaction metagraphs (RIMs) is also 
an ISMG of the aggregate RIM. There are four possibilities, as illustrated in 
Table 9.2. 

The first possibility, termed decomposable organization in Table 9.2, oc- 
curs when both the submetagraph and the RIM are ISMGs. In this case the 
corresponding workflow is a good candidate for decomposition. Of course, 
there may be other criteria for treating the decomposed workflow separately. 
Certain economic and managerial issues should be considered here, as well 
as traditions and issues of corporate culture. However, this joint independence 
suggests that there are no structural (or process-specific) impedimenfs to de- 
composition. 

The second possibility, termed matrix organization in Table 9.2, occurs 
when the submetagraph is an ISMG but the RIM is not an ISMG. In this case 
the tasks are separable but the process resources are shared with other parts of 
the resource aggregate. This suggests the use of a matrix structure in which the 
decomposed sets of tasks (corresponding to the ISMGs) are managed by sep- 
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arate project managers. Resource managers would assign the resources cen- 
trally to the various projects. 

The third possibility, termed modular workgroups in Table 9.2, occurs when 
the submetagraph is not an ISMG but the RIM is an ISMG. In this case the 
tasks contained in the subprocess use specialized resources but interact with 
other tasks outside the subprocess. In this case, the resources can be organized 
in a module (e.g., a workgroup). Thus, even though the tasks performed within 
the subprocess require interaction in the form of inputs and outputs with other 
tasks outside the subprocess, the module/workgroup itself requires less coordi- 
nation with external resources, since only certain resources interact with other 
resources. 

The fourth possibility, termed monolithic organization in Table 9.2, occurs 
when neither the submetagraph nor its RIM is an ISMG. In this case the orga- 
nization cannot be decomposed for structural reasons. Of course, there may be 
compelling reasons to decompose because of personalities, organizational cul- 
ture, economics, geographical locations, traditions, or other reasons. However, 
process managers should realize that there are structural arguments against de- 
composition. 

We note that these independence conditions must hold for all interpretations 
of the conditional metagraph representing the process in question. However, 
in some cases there may be only partial independence - that is, independence 
in some interpretations but not in others. In this case, process managers and 
analysts may select the organizational alternatives suggested above for most 
instances and be prepared to override them in the special circumstances. 

We illustrate how an analysis of resource interactions can augment the 
analysis of metagraph independence for process decomposition, using our 
loan evaluation example in Figure 9.8. The edge designations appear above 
the appropriate edge in Figure 9.8, and the resources used in the various tasks 
are indicated in the figure below the appropriate edge. We can see that there 
are three resources - r i : a risk analyst, r 2 : a loan officer and rj : a loan process- 
ing clerk. The RIM corresponding to Figure 9.8 is shown in Figure 9.9. The 
edge labels in Figure 9.9 indicate the nature of the resource interactions. For 
instance, the label (CR) {e\, ej) on the edge from {r 2 , r 3 ) to {ri} indicates that 
the loan officer and loan clerk compute CR in the task e\ and provide it to the 
risk analyst for use in edge C 2 . 

From the RIM, it is apparent that the risk analyst (ri) always works alone, 
while the other resources work both alone as well as with each other. Thus, 
a potential candidate for decomposition is the set of edges using r\. We can 
identify these edges from either the RIM (source edges in labels of all edges 
emanating from r\), or from the process metagraph. The edges are C 2 , e\ and 
65 . What next needs to be examined is whether these three edges form an 
ISMG. 
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If C2, 64 and 65 are extracted as a separate sub-process, we get the metagraph 
shown in Figure 9.10 where the extracted sub-process is represented by the 
new edge e' . Unfortunately, this is not an ISMG, since LR violates output de- 
pendency. However, this occurs in only one of the two possible interpretations 
of the process (i.e., when IRH = TRUE and therefore, ?RL = FALSE), and 
thus represents a case of partial independence. In other words, for all low risk 
cases {?RL = TRUE), the risk assessment sub-process represented by e2, 64 
and e$ can be extracted as a separate process, since 65 does not apply and thus 
e' corresponds to an ISMG. However, for high-risk cases {?RH = TRUE), the 
fact that LR is involved in the sub-process represented by e' is lost, so that S' 
is not an accurate representation of the workflow. 

9. REPRESENTING TIME-CRITICAL WORKFLOWS 
WITH ATTRIBUTED METAGRAPHS 

It is also possible to attach quantitative (numerical) attributes to metagraph 
edges. The purpose of this would be to allow certain calculations to be per- 
formed. For example, if the attributes are the costs of the tasks represented 
by the edges, then these attributes can be used to determine the total cost of 
the tasks appearing in a workflow. If the attributes represent the durations of 
the tasks, then they can be used to calculate the duration of the workflow in 
a fashion similar to the PERT/CPM calculations used in project management. 
If the attributes represent measures of performance, such as degrees of relia- 
bility or probabilities of non-failure, then they can be used to determine the 
performance of the workflow. 

These attributes might also be combined. For example, if certain numeri- 
cal attributes represent both time (i.e., activity durations) and cost, then these 
attributes might be combined to perform time/cost tradeoffs. If they represent 




Metagraphs in Workflow and Process Analysis 



147 




Figure 9.10. The metagraph S' . 
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Figure 9.11. Time-constrained workflow metagraph: ACst - annual operating cost; CEst - 
mfg. cost estimates; DSpec - design specifications; LCC - total life cycle cost; LCOC - life 
cycle operating cost; MRep - mileage report; MuPol - mark-up policy; PSpec - production 
specifications; SalP - sales price; ServL - service life estimate. 



either cost or duration along with prohahility of non-failure, then they might 
he used to determine the prohahility distrihutions of workflow cost or duration, 
depending on what will he done if a task represented hy an edge should fail. 
In this chapter we will focus on deterministic activity durations, and we will 
not consider time/cost tradeoffs. 

In this section, we discuss how metagraph-hased analysis of workflows can 
help identify critical activities and time-critical information elements in each 
workflow. This knowledge can then he used to manage the resources allocated 
to each activity in the workflow, and ultimately, in the underlying business 
process. 

Consider a workflow metagraph in which each edge is labeled with its ex- 
pected duration. An example of such a metagraph is shown in Figure 9. 1 1. The 
generating set consists of ten elements, each describing a document used in a 
life cycle costing workflow for a vehicle. There are five edges, each represent- 
ing a task in the workflow. The edge e\ represents a task that converts design 
specifications {DSpec) and production specifications {PSpec) into manufactur- 
ing cost estimates {CEst) and an estimate of service life {ServL). The edge e 2 
represents a task that converts the vehicle’s mileage report {MRep) yet another 
estimate of service life and also an estimate of annual operating cost {ACst). 
The edge ej, represents a task that uses the manufacturing cost estimate and 
the company’s markup policy {MUPol) to produce a sales price {SalP), and 
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the edge ^4 uses the estimates of service life and annual operating cost to cal- 
culate the life cycle operating cost (LCOC). Finally, 65 uses the sale price and 
the life cycle operating cost to calculate the total life cycle cost (LCC). 

Given the source and target specification for this workflow, it is possible 
that the actual time available for some of the tasks is greater than the expected 
time, while other tasks are critical, in the sense that they have to be completed 
within very tight time constraints. We can identify the tasks and information 
elements in each category, using analysis similar to PERT/CPM methods in 
project management (Moder, Phillips and Davis, 1983). However, since the 
A* matrix is available for the metagraph representation, it can be exploited in 
the process. 

We treat cases where the outvertices of two or more edges overlap in a 
manner consistent with traditional project management approaches such as 
PERT/CPM. That is, when a particular information element is computed by 
multiple tasks in a workflow, its value is determined only after all these activ- 
ities have completed. Eor example, in Eigure 9.11, the value of ServL can be 
used as an input to another activity (e.g., activity 64 ), only after both activi- 
ties ei and 62 have been completed. Given any acyclic metapath from a source 
set B to a target set C critical elements and activities can be found as follows: 

Procedure Critical-Path (M , B, C) 



Phase 1 : Early times 

Eor each element v, in B, assign the label Qi = 0, and mark x; as live; let 
Qi =0 for all other elements. 

Eet E = M(B, C) 

While £ 7 ^ 0 , for each edge ej in E such that all elements in the invertex of 
ej are live, do 

Eet Ty = max^.eVe.(Qi) 

Eor each xt G Wej, set Qt = ra&xYQk, (T^ + dj)], and mark it as live if it is 
not already so. 

Set £ = £\{ej} 

Repeat 

Eet T‘^ = T^ = max;c,.gx Qi 
Phase 2: Eate times 

Assign Li = for all elements x; in C, and mark them as live; set L; =oc for 
all other elements. 
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Let Eq = M{B, C) 

While Eqj^0, for each edge ej in Eq such that all elements in the outvertex 
of 6j are live, do 

Let T^ = min^.eWej{Li) 

For each Xk G Vej, set Ek = min[Lyt, (T^ — dj)] and mark it as live if it is not 
already so. 

Set Eq = Eo\{ej} 

Repeat 

Phase 3: Critical elements 

For all Xi G X, if Qi = Ei then x; is marked as critical, with completion time 
Ti = Qi = Li 

END. 

Definition 9.1. An invertex V is critical if maXxieviQi) = tmnx^^y{Li). 
We note that we will always have {Qi) < miiix,6v(^;) as long as 

Procedure Critical-Path is used to label the metagraph. 

Theorem 9.1. An invertex V is critical if it contains any critical elements. 

Proof. Without loss of generality, assume that V contains two elements, a 
critical element a and a non-critical element b. Let Qa, La, and Qh, Lb he the 
early times and late times respectively of these elements. Then 

max(2;) = max[2a, Qt), min(L,) = min[La, L^]. 

X,'6V X,'6V 



Since V is feasible, it follows that max[2a, Qb) < min[La, Lb). However, 
since a is critical, Qa = La, and thus, max[Qa, Qb) < tmn[Qa, Lb). 

There are three possible cases: 

Case 1: Qa E Qb < Lb. In this case, max[2,] = Qb, min[L,] = Qa which 
makes the vertex infeasible unless Qb = Qa, iti which case the vertex is criti- 
cal. 

Case 2: Qb < Qa E Lb. In this case, max[2,] = Qa and min[L,] = Qa, 
and the result follows. 

Case 3: Qb E Lb < Qa- In this case, max[2,] = Qa and min[L;] = Lb, 
and as in Case I , the only feasible situation is when the vertex is critical. □ 

We note that these elements must be contained in a critical invertex, except 
for those in the final outvertex, in this case LCC. 
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Definition 9.2. The slack in an edge e is defined as 
slack(e) = min (L,) — max (2/). 

Xi e We Xj e Ve 

Definition 9.3. An edge is defined as critical if it has no slack. 

Theorem 9.2. If the invertex of an edge e contains a critical element a with 
completion time Ta and the outvertex of the edge has a critical element b with 
completion time Th such that Ty — Ta = dg then e is critical. 

Proof. Since a and b are critical, it follows that rAmxfeWeil^i) STp, 
max^.^VeiQj) > Ta. 

Thus, mmxieWeiTi) - maXxjeVeiQj) < Tp - Ta = dg, so that slack(e) < 
dg — dg = 0, which proves the result. □ 

Theorem 9.3. Each critical edge lies on a critical simple path from some 
element in B to some element in C. 

Proof. Since the edges under consideration are all part of a metapath from 
B to C, each critical edge is on some simple path from an element in B to 
another in C. Now consider all such simple paths passing through a given 
critical edge e. If none of these paths is critical, then there is non-zero slack 
on all of them, which in turn means that e must have non- zero slack, which 
contradicts the claim that e is critical. Thus, at least one of these paths must 
have zero slack, and thus is critical. □ 

We can illustrate these concepts and how they can he used, using the ex- 
ample workflow metagraph in Figure 9.11. Applying Procedure Critical-Path 
to this metagraph, we get a labeling of the early and late times for each ele- 
ment as illustrated in Figure 9. 12. From Theorem 9. 1, we can then infer that the 
critical vertices are {DSpec, PSpec}, {CEst, ServL}, {SalP, LCOC} and {LCC}. 
The critical edges in the workflow are then ei , C 4 , and e$ (hy Theorem 9.2) and 
thus the critical path through the workflow is {e\, e^, e$) (hy Theorem 9.3). 

Once the critical path is known, scheduling and resource allocation of tasks 
can he done more effectively. For example, although task e\ is critical, not all 
of its outputs are critical - that is, although the service life estimate {ServL) 
must he ready at the earliest possible time (i.e., time = 5), the manufacturing 
cost estimates {CEst) can be generated anytime between times 5 and 7. Thus, 
our analysis helps workflow managers not only with inter-task scheduling but 
also with intra-task scheduling. 

These results can also be useful in resource allocation. The metagraph 
analysis enables managers to determine that any resources for tasks e 2 and 
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Figure 9.12. The workflow metagraph in Figure 9.11 with critical elements and critical path. 
The numbers in parentheses beneath each element x; denote (2; , L, ) for that element, the thick 
arrows denote the critical edges, and boldface denotes critical elements. 



63 can be reallocated as long as they are not critical - that is, as long as 
the reduced resources do not extend the task durations more than the existing 
slacks in the edges. Furthermore, even within critical tasks such as e\ resources 
should be allocated to service life estimation with higher priority than to man- 
ufacturing cost estimation, unless other factors (e.g., quality or performance) 
become significant. 




Chapter 10 

CONCLUSION 

We have now completed our presentation of metagraph theory and applica- 
tions. We began hy defining a metagraph as a collection of directed set-to-set 
mappings, where the sets are subsets of a generating set, at most one of the 
sets in any edge is null, and for any edge the two sets defining the edge are 
disjoint. We then developed an algebraic theory of metagraphs and applied it 
to metagraph connectivity, metagraph transformations (especially projection), 
assignment of attributes to edges, assumptions and conditional metagraphs, 
and the properties of sub-metagraphs. Finally we examined the application of 
metagraphs to the structuring of decision support (i.e., data, model, and rule 
management) systems and workflow systems. 

We conclude in this chapter by addressing three topics. The first is the meta- 
graph modeling process. We center our attention on a previously proposed 
model development life cycle, similar to the systems development life cycle 
used in information systems analysis and design, and discuss its application 
to metagraphs. The second is the construction of a metagraph workbench - 
a computer-based tool for constructing metagraphs in a variety of contexts. 
The third topic is the possible application of metagraphs to a quite different 
area not discussed previously - social networks. 

1. THE METAGRAPH MODELING PROCESS 

The metagraph modeling process, like other information systems develop- 
ment processes, is accomplished by means of a set of largely sequential but oc- 
casionally overlapping design tasks. These tasks are often called stages. These 
stages have been studied in detail for general information processing systems 
in the form of a systems development life cycle or SDLC (Dennis and Wixom, 
2000; Hoffer, George and Valacich, 1999). They have also been studied more 
specifically for a more specialized process, model construction, in the form 
of a model development life cycle, or MDLC (Blanning, 2003). We explain 
briefly the MDLC and interpret it in the context of metagraphs. 

A life cycle is a sequence of stages that are needed to accomplish a goal. 
Although the stages are presented as a sequence, it is understood that there will 
be a certain amount of overlap as the life cycle is implemented, because earlier 
stages may have to be revisited after later ones have been initiated. Typically, 
the first stage is a recognition that problem exists and requires the initiation of 
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the remainder of the eycle, and the last stage assumes that a system has been 
designed and implemented and must now he maintained and possibly modi- 
fied. The intermediate stages may include feasibility study, systems analysis, 
and systems design. 

The life cycle concept is of special interest because it helps to structure our 
thinking about the underlying processes and it often provides a framework for 
preparing progress reports. Very often these stages require a formal statement 
or signoff by the customers of the system, and thus provide a formal channel 
of communication between designers and users. This is of special importance 
because it is believed that the principal cause of system failure is lack of user 
acceptance, and that this in turn is the result of poor communication between 
designers and users. In addition, mistakes made during the early stages may 
not become manifest until the later stages, and an understanding of the cycle 
may be helpful in identifying these mistakes. 

The Metagraph Development Life Cycle (MDLC) is illustrated in the “wa- 
terfall” diagram of Figure 10.1. The first stage is Problem Identification. This 
is the identification of an information processing problem and a determina- 
tion of whether metagraphs provide a reasonable foundation for describing an 
information processing system for addressing the problem. This consists of 
identifying (1) the entities of interest, including both the information elements 
and the assumptions that make up the generating set, (2) the entity aggregates 
that will make up the invertices and outvertices of the metagraph edges and, 
(3) the ordered pairs that will define the metagraph edges, including the re- 
sources (labels) needed to implement the models or tasks represented by the 
edges. 

The second stage. Metagraph Construction, consists of three substages. The 
first is identification of the generating set. The elements in the generating set 
will be the data attributes, rule propositions, model varriables, and/or workflow 
workstations that make up the system being represented. The second substage 
is the aggregation of these elements into clusters that will become invertices 
and outvertices. The third substage is the identification of metagraph edges 
that represent relationships between these clusters and the directions of these 
relationships. 

The third stage. Requirements Analysis, consists of two subtasks. The first 
is structural analysis, which means the specification of information structures 
- such as metapaths, projections, and contexts - that will be useful in decision 
making. Of course others can be added as needed, but an understanding of the 
structures that might be needed may help in determining the information ele- 
ments (including assumptions and labels) that should be included in the system 
and the relationships (edges) among these elements. The second subtask is a 
feasibility study, which usually consists of four components - technical feasi- 
bility, economic feasibility, operational feasibility, and schedule feasibility. 
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Figure 10.1. The metagraph development life cycle. 



The fourth stage, Model Construction and Implementation, consists of the 
construction and implementation of the decision models, stored data relations, 
business rules, and workflows that make up the system. This is the stage in 
which much of the “real work” - that is, the data collection, programming, 
testing, and documentation - takes place. One potential mistake, which should 
he avoided, is to jump into this stage before the earlier stages, which lay out 
the ground work for this stage, have been largely completed. 

The fifth stage is Maintenance and Modification, indexmaintenance and 
modification Maintenance consists of gradual changes in a decision model, 
data relation, business rule, or workstation. These may be caused by errors 
detected in data or software or minor changes in external conditions, such as 
government relations or the practices of suppliers, customers, or business part- 
ners. Modification is similar to maintenance except that the changes are more 
severe. These include such qualitative structural changes as the introduction of 
new production facilities (and therefore new components of a decision model), 
major changes in database or rule structures, and new workflow stations. 

The maintenance and modification stage is of special importance because 
much of the cost incurred in the life cycle occurs in this stage. Therefore, 
the first three stages should be conducted with this stage in mind. The data 
elements and the edges should be designed for maintainability - for example, 
element names should be intuitive and procedures and data relations should be 
well documented. In addition, a metagraph workbench of the type proposed in 
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the next section may be helpful not only in metagraph construction, but also 
in metagraph maintenance and modification. 

2. TOWARDS A METAGRAPH WORKBENCH 

There are three purposes of a DSS analysis workbench. One is to serve as 
a testbed for analyzing the effectiveness of metagraphs as a representational 
construct for decision analysis and decision support. It will also require the 
transformation of analysis and design principles discussed in Section 1 of this 
chapter into structured procedures and algorithms. Finally, it will enable us to 
empirically test the viability of any metagraph-based decision process model. 
While this empirical testing is beyond the scope of this chapter (and this book), 
it is an important element of any long term research agenda in metagraphs. 

The architecture of the system is shown in Figure 10.2. The primary user 
interface is provided by the metagraph editor, which is a graphical user inter- 
face for drawing and using metagraphs. The user can create new metagraph 
edges or recall existing edges from the metagraph store, which maintains all 
known edges and elements. As each edge is added to a current metagraph, the 
system updates the corresponding algebraic representation (i.e., the adjacency 
matrix), and this would be transparent to the user. The current metagraph can 
be compiled at any time through construction of the transitive closure of the 
adjacency matrix (i.e., the A* matrix). Once this is done, the user can obtain 
additional information such as the presence of any bridges or cycles, and these 
are displayed in the graphical interface of the editor with color highlighting 
of the relevant edges. Thus, even during the process of constructing the meta- 
graph, the user can obtain analytical feedback about the DSS resources that 
are being included in the proposed system. This can help to identify what 
additional edges (resources) are needed, the potential disadvantages of elimi- 
nating any edges (e.g., the elimination of connectivity between certain inputs 
and outputs), and the multideterminacy conditions caused by cycles. 

The system has two internal storage components, a metagraph store and an 
assumptions store. The former is used to store all known edges in one or more 
metagraphs. Edges from one or more metagraphs can be combined into a new 
metagraph by using the metagraph store (if they are all stored in the same file 
or in a network of linked files). The metagraph store also enables two or more 
metagraphs to be combined (using metagraph addition) in the editor. The as- 
sumptions store is a propositional database used to maintain assumptions and 
their valuations. One promising extension is the use of a belief maintenance 
system as an enhancement to the assumptions store (Raghunathan, Krishnan 
and May, 1995). 

From the metagraph editor the user can access three special modules: the 
metapath builder, the projection builder, and the context builder. 
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Figure 10.2. Structure of a metagraph-based tool. 



• The metapath builder constructs all possible metapaths between any two 
sets of elements. It uses algorithms based on the A* matrix that is main- 
tained in the metagraph store, once the current metagraph is compiled. 
The metagraph builder can be accessed several ways. For example, it can 
be used to construct one or all metapaths between any given element sets 
specified by the user from the editor. It can also be invoked by the pro- 
jection builder during the process of constructing projection views of the 
current metagraph. 

• The projection builder is used to construct simplified views of the current 
metagraph over a projection set (of elements) specified by the user. The 
projection is displayed as a simplified metagraph in a separate window. 
Since the process of defining a projection involves the identification of 
a set of dominant metapaths, the projection builder utilizes the metapath 
builder for this purpose. 

• The context builder is used to construct simplified views of the current 
metagraph that identify those edges that are applicable in a specific con- 
text (specified by a set of known true and false assumptions). As with a 
projection, the resulting metagraph can be displayed in a separate win- 
dow, although an alternative display is by superimposition upon the cur- 
rent metagraph, which emphasizes the fact that the context does not 
delete any models. The context builder utilizes the A matrix stored in the 
metagraph store, as well as assumptions maintained in the assumptions 
store. 
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A metagraph workbench can be used on a stand-alone basis, with edges 
being entered manually by the user or maintained internally in the metagraph 
store. However, a natural extension is to integrate this system with DSS mod- 
ules such as a model base, data base, and rule base. In this way metagraph 
construction can be a part of the process of adding models, data, rules, and/or 
workflows to the DSS. In this case the metagraph representation and associ- 
ated tools serve as part of the front end to the DSS. An alternative is to view 
the metagraph-based system as part of the model selection component of an 
integrated modeling environment, as in (Banerjee and Basu, 1993). 

The development of a metagraph workbench would be facilitated by a de- 
velopment in metagraph algorithms not currently addressed in the literature: 
the identification of planar metagraphs and the representation of planar meta- 
graphs in visually planar form. In graph theory a planar graph is a simple graph 
or digraph that can be drawn on the plane without intersecting edges, and there 
are theoretical results for determining whether an arbitrary graph is planar. If 
it were possible to determine whether an arbitrary metagraph is planar and 
to render a planar metagraph onto the plane, the visualization of metagraphs, 
metapaths, projections, and contexts would be facilitated. 

3. METAGRAPHS AND SOCIAL NETWORKS 

Social networks are networks of individual people or groups of similarly 
situated people, typically departments in an organization (Cross, Parker and 
Sasson, 2003; Cross and Parker, 2004). Social network analysis is a collec- 
tion of concepts and techniques for representing (usually in the form of sim- 
ple graphs or digraphs) and analyzing social networks. The resulting concepts 
concern such topics as kinship, friendship, trust relationships, collaboration, 
and the sharing and diffusion of information. Of special interest are people 
and links that are especially prominent - for example, people who have many 
relationships (e.g., acquaintances, collaborators, etc.) compared with others in 
the organization and ties that are especially strong. The latter would be repre- 
sented by a labeled metagraph. 

Another important concept, in the context of organizations, is the difference 
between a hierarchy and a more general network. A hierarchy, often repre- 
sented by an organization chart, represents the formal structure of an organi- 
zation, such as reporting relationships and groupings of similarly specialized 
people and people with similar organizational attributes. People with similar 
attributes are said to be homophilous, and social and organizational communi- 
cation often involves homophilous people. A network, on the other hand, rep- 
resents an informal organizational structure that often crosses organizational 
stovepipes and is said to describe how the work in an organization really gets 
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done. The resulting simple graph or digraph is sometimes called a sociogram 
or sociometric diagram. 

The application of metagraphs to social networks may he helpful in cross- 
walking individuals and suhorganizations - that is, in treating people both as 
separate individuals and memhers of various organizations and suhorganiza- 
tions. These people may communicate directly or through the organizational 
groups in which they are imbedded. This would require a generalization of 
metagraphs to symmetric metagraphs, in which the edges are represented as 
unordered, rather than ordered, pairs of subsets of the generating set. This 
would allow for undirected links between vertices in the metagraph and would 
require new definitions and new algorithms concerning metapaths and meta- 
graph connectivity. 

Five other areas are as follows: 

• The first is cyclic metagraphs. We have briefly described cycles in meta- 
graphs in the context of inference paths in rule -based systems. However, 
this concept may be even more prominent in social networks. Social and 
organizational relationships are often cyclic, because they represent iter- 
ative processes. An example is budgeting processes in organizations. Ho- 
erver, cyclic metagraphs differ from cyclic simple graphs and digraphs 
in several respects. For example, the removal of certain edges from a cy- 
cle may leave the cycle intact, and algorithms for the identification such 
edges may be of interest. 

• The second is metagraph isomorphism, which is analogous to graph iso- 
morphism. Two metagraphs are isomorphic if they are identical up to a 
relabeling of their generating sets and edges. Metagraph isomorphism 
may be useful in identifying two structurally equivalent organizations or 
other social systems. 

• The third is the representation of uncertainty and ambiguity in meta- 
graphs. This would presumably be accomplished by labeling edges and 
possibly elements with measures of uncertainty and ambiguity. Thus, we 
may envision stochastic metagraphs and/or fuzzy metagraphs that capture 
a lack of specificity in knowledge about social actors and their interac- 
tions. 

• The fourth area is metagraph coloring. Theorems concerning the coloring 
of simple graphs have been of substantial theoretical interest - for exam- 
ple, in solving the four-color problem concerning planar maps. Analy- 
ses of metagraph coloring would be far more complicated, because it 
would require consideration of what is to be colored - elements and/or 
edges. However, such a theory and its consequent applications may sug- 
gest ways in which disparate people and organizational subgroups may 
establish boundaries and encourage or resist cooperation. 




160 



A. Basu and R. W. Blanning 



• The fifth area is the relationship between metapaths and metagraph cuts. 
In simple graphs a cut is a set of edges whose removal would disconnect 
two otherwise connected elements or vertices, and the maximum num- 
ber of edge disjoint paths between two connected vertices is equal to the 
minimum size of a cut separating the vertices. This has led to some inter- 
esting results concerning flows in graphs. It would be interesting to un- 
derstand any relationships between metapaths and metagraph cuts. This 
might suggest useful results concerning the flows of materials through 
metagraph-based logistical structures. 

Thus, there are many promising opportunities for new areas of research and 
application in metagraphs, many possibly not yet anticipated. 

4. AND FINALLY 

We believe that the topic of metagraphs is a research goldmine. We have 
found this topic to be a font of stimulating ideas and mathematical results in 
several areas of information processing, including decision support (i.e., data 
management, model management, and rule management) systems and work- 
flow systems. This topic also holds forth the promise additional mathematical 
results, additional mathematical structures, and additional areas of application 
- for example, in the modeling of social and organizational systems and pos- 
sibly other systems as well. 

The reason seems to be that we are entering an age of connectivity. People 
want to communicate and establish systems of centralized, decentralized, and 
peer-to-peer networks involving people, their organizations, and decision sup- 
port (data, models, rules) modules. Metagraphs may provide a helpful founda- 
tion for these networks. 
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